2023-03-17 17:13:21

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 1/2] nfsd: don't replace page in rq_pages if it's a continuation of last page

The splice read calls nfsd_splice_actor to put the pages containing file
data into the svc_rqst->rq_pages array. It's possible however to get a
splice result that only has a partial page at the end, if (e.g.) the
filesystem hands back a short read that doesn't cover the whole page.

nfsd_splice_actor will plop the partial page into its rq_pages array and
return. Then later, when nfsd_splice_actor is called again, the
remainder of the page may end up being filled out. At this point,
nfsd_splice_actor will put the page into the array _again_ corrupting
the reply. If this is done enough times, rq_next_page will overrun the
array and corrupt the trailing fields -- the rq_respages and
rq_next_page pointers themselves.

If we've already added the page to the array in the last pass, don't add
it to the array a second time when dealing with a splice continuation.
This was originally handled properly in nfsd_splice_actor, but commit
91e23b1c3982 removed the check for it.

Fixes: 91e23b1c3982 ("NFSD: Clean up nfsd_splice_actor()")
Cc: Al Viro <[email protected]>
Reported-by: Dario Lesca <[email protected]>
Tested-by: David Critch <[email protected]>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2150630
Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/vfs.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 502e1b7742db..97b38b47c563 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -941,8 +941,14 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
struct page *last_page;

last_page = page + (offset + sd->len - 1) / PAGE_SIZE;
- for (page += offset / PAGE_SIZE; page <= last_page; page++)
- svc_rqst_replace_page(rqstp, page);
+ for (page += offset / PAGE_SIZE; page <= last_page; page++) {
+ /*
+ * Skip page replacement when extending the contents
+ * of the current page.
+ */
+ if (page != *(rqstp->rq_next_page - 1))
+ svc_rqst_replace_page(rqstp, page);
+ }
if (rqstp->rq_res.page_len == 0) // first call
rqstp->rq_res.page_base = offset % PAGE_SIZE;
rqstp->rq_res.page_len += sd->len;
--
2.39.2



2023-03-17 17:13:22

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 2/2] sunrpc: add bounds checking to svc_rqst_replace_page

If rq_next_page ends up pointing outside the array, then we can corrupt
memory when we go to change its value. Ensure that it hasn't strayed
outside the array, and have svc_rqst_replace_page return -EIO without
changing anything if it has.

Fix up nfsd_splice_actor (the only caller) to handle this case by either
returning an error or a short splice when this happens.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/vfs.c | 23 +++++++++++++++++------
include/linux/sunrpc/svc.h | 2 +-
net/sunrpc/svc.c | 14 +++++++++++++-
3 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 97b38b47c563..0ebd7a65a9f0 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -939,6 +939,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
struct page *page = buf->page; // may be a compound one
unsigned offset = buf->offset;
struct page *last_page;
+ int ret = 0, consumed = 0;

last_page = page + (offset + sd->len - 1) / PAGE_SIZE;
for (page += offset / PAGE_SIZE; page <= last_page; page++) {
@@ -946,13 +947,23 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
* Skip page replacement when extending the contents
* of the current page.
*/
- if (page != *(rqstp->rq_next_page - 1))
- svc_rqst_replace_page(rqstp, page);
+ if (page != *(rqstp->rq_next_page - 1)) {
+ ret = svc_rqst_replace_page(rqstp, page);
+ if (ret)
+ break;
+ }
+ consumed += min_t(int,
+ PAGE_SIZE - offset_in_page(offset),
+ sd->len - consumed);
+ offset = 0;
}
- if (rqstp->rq_res.page_len == 0) // first call
- rqstp->rq_res.page_base = offset % PAGE_SIZE;
- rqstp->rq_res.page_len += sd->len;
- return sd->len;
+ if (consumed) {
+ if (rqstp->rq_res.page_len == 0) // first call
+ rqstp->rq_res.page_base = offset % PAGE_SIZE;
+ rqstp->rq_res.page_len += consumed;
+ return consumed;
+ }
+ return ret;
}

static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe,
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 877891536c2f..9ea52f143f49 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -422,7 +422,7 @@ struct svc_serv *svc_create(struct svc_program *, unsigned int,
int (*threadfn)(void *data));
struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv,
struct svc_pool *pool, int node);
-void svc_rqst_replace_page(struct svc_rqst *rqstp,
+int svc_rqst_replace_page(struct svc_rqst *rqstp,
struct page *page);
void svc_rqst_free(struct svc_rqst *);
void svc_exit_thread(struct svc_rqst *);
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index fea7ce8fba14..d624c02f09be 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -843,8 +843,19 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads);
* When replacing a page in rq_pages, batch the release of the
* replaced pages to avoid hammering the page allocator.
*/
-void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
+int svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
{
+ struct page **begin, **end;
+
+ /*
+ * Bounds check: make sure rq_next_page points into the rq_respages
+ * part of the array.
+ */
+ begin = rqstp->rq_pages;
+ end = &rqstp->rq_pages[RPCSVC_MAXPAGES];
+ if (WARN_ON_ONCE(rqstp->rq_next_page < begin || rqstp->rq_next_page > end))
+ return -EIO;
+
if (*rqstp->rq_next_page) {
if (!pagevec_space(&rqstp->rq_pvec))
__pagevec_release(&rqstp->rq_pvec);
@@ -853,6 +864,7 @@ void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)

get_page(page);
*(rqstp->rq_next_page++) = page;
+ return 0;
}
EXPORT_SYMBOL_GPL(svc_rqst_replace_page);

--
2.39.2


2023-03-17 17:34:13

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] nfsd: don't replace page in rq_pages if it's a continuation of last page



> On Mar 17, 2023, at 1:13 PM, Jeff Layton <[email protected]> wrote:
>
> The splice read calls nfsd_splice_actor to put the pages containing file
> data into the svc_rqst->rq_pages array. It's possible however to get a
> splice result that only has a partial page at the end, if (e.g.) the
> filesystem hands back a short read that doesn't cover the whole page.
>
> nfsd_splice_actor will plop the partial page into its rq_pages array and
> return. Then later, when nfsd_splice_actor is called again, the
> remainder of the page may end up being filled out. At this point,
> nfsd_splice_actor will put the page into the array _again_ corrupting
> the reply. If this is done enough times, rq_next_page will overrun the
> array and corrupt the trailing fields -- the rq_respages and
> rq_next_page pointers themselves.
>
> If we've already added the page to the array in the last pass, don't add
> it to the array a second time when dealing with a splice continuation.
> This was originally handled properly in nfsd_splice_actor, but commit
> 91e23b1c3982 removed the check for it.
>
> Fixes: 91e23b1c3982 ("NFSD: Clean up nfsd_splice_actor()")
> Cc: Al Viro <[email protected]>
> Reported-by: Dario Lesca <[email protected]>
> Tested-by: David Critch <[email protected]>
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2150630
> Signed-off-by: Jeff Layton <[email protected]>

I've applied this one internally for due-diligence testing.
I'll wait a couple of days for review comments, so it's
not likely to make v6.3-rc3 on Sunday.


> ---
> fs/nfsd/vfs.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 502e1b7742db..97b38b47c563 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -941,8 +941,14 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> struct page *last_page;
>
> last_page = page + (offset + sd->len - 1) / PAGE_SIZE;
> - for (page += offset / PAGE_SIZE; page <= last_page; page++)
> - svc_rqst_replace_page(rqstp, page);
> + for (page += offset / PAGE_SIZE; page <= last_page; page++) {
> + /*
> + * Skip page replacement when extending the contents
> + * of the current page.
> + */
> + if (page != *(rqstp->rq_next_page - 1))
> + svc_rqst_replace_page(rqstp, page);
> + }
> if (rqstp->rq_res.page_len == 0) // first call
> rqstp->rq_res.page_base = offset % PAGE_SIZE;
> rqstp->rq_res.page_len += sd->len;
> --
> 2.39.2
>

--
Chuck Lever



2023-03-17 17:52:14

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] sunrpc: add bounds checking to svc_rqst_replace_page

On Fri, 2023-03-17 at 13:13 -0400, Jeff Layton wrote:
> If rq_next_page ends up pointing outside the array, then we can corrupt
> memory when we go to change its value. Ensure that it hasn't strayed
> outside the array, and have svc_rqst_replace_page return -EIO without
> changing anything if it has.
>
> Fix up nfsd_splice_actor (the only caller) to handle this case by either
> returning an error or a short splice when this happens.
>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfsd/vfs.c | 23 +++++++++++++++++------
> include/linux/sunrpc/svc.h | 2 +-
> net/sunrpc/svc.c | 14 +++++++++++++-
> 3 files changed, 31 insertions(+), 8 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 97b38b47c563..0ebd7a65a9f0 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -939,6 +939,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> struct page *page = buf->page; // may be a compound one
> unsigned offset = buf->offset;
> struct page *last_page;
> + int ret = 0, consumed = 0;
>
> last_page = page + (offset + sd->len - 1) / PAGE_SIZE;
> for (page += offset / PAGE_SIZE; page <= last_page; page++) {
> @@ -946,13 +947,23 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> * Skip page replacement when extending the contents
> * of the current page.
> */
> - if (page != *(rqstp->rq_next_page - 1))
> - svc_rqst_replace_page(rqstp, page);
> + if (page != *(rqstp->rq_next_page - 1)) {
> + ret = svc_rqst_replace_page(rqstp, page);
> + if (ret)
> + break;
> + }
> + consumed += min_t(int,
> + PAGE_SIZE - offset_in_page(offset),
> + sd->len - consumed);
> + offset = 0;
> }
> - if (rqstp->rq_res.page_len == 0) // first call
> - rqstp->rq_res.page_base = offset % PAGE_SIZE;
> - rqstp->rq_res.page_len += sd->len;
> - return sd->len;
> + if (consumed) {
> + if (rqstp->rq_res.page_len == 0) // first call
> + rqstp->rq_res.page_base = offset % PAGE_SIZE;

Oops, this will get the page_base wrong if the first splice covers
multiple pages. I'll need to respin the offset handling.

> + rqstp->rq_res.page_len += consumed;
> + return consumed;
> + }
> + return ret;
> }
>
> static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe,
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index 877891536c2f..9ea52f143f49 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -422,7 +422,7 @@ struct svc_serv *svc_create(struct svc_program *, unsigned int,
> int (*threadfn)(void *data));
> struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv,
> struct svc_pool *pool, int node);
> -void svc_rqst_replace_page(struct svc_rqst *rqstp,
> +int svc_rqst_replace_page(struct svc_rqst *rqstp,
> struct page *page);
> void svc_rqst_free(struct svc_rqst *);
> void svc_exit_thread(struct svc_rqst *);
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index fea7ce8fba14..d624c02f09be 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -843,8 +843,19 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads);
> * When replacing a page in rq_pages, batch the release of the
> * replaced pages to avoid hammering the page allocator.
> */
> -void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
> +int svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
> {
> + struct page **begin, **end;
> +
> + /*
> + * Bounds check: make sure rq_next_page points into the rq_respages
> + * part of the array.
> + */
> + begin = rqstp->rq_pages;
> + end = &rqstp->rq_pages[RPCSVC_MAXPAGES];
> + if (WARN_ON_ONCE(rqstp->rq_next_page < begin || rqstp->rq_next_page > end))
> + return -EIO;
> +
> if (*rqstp->rq_next_page) {
> if (!pagevec_space(&rqstp->rq_pvec))
> __pagevec_release(&rqstp->rq_pvec);
> @@ -853,6 +864,7 @@ void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
>
> get_page(page);
> *(rqstp->rq_next_page++) = page;
> + return 0;
> }
> EXPORT_SYMBOL_GPL(svc_rqst_replace_page);
>

--
Jeff Layton <[email protected]>

2023-03-17 17:52:18

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] sunrpc: add bounds checking to svc_rqst_replace_page


> On Mar 17, 2023, at 1:13 PM, Jeff Layton <[email protected]> wrote:
>
> If rq_next_page ends up pointing outside the array, then we can corrupt
> memory when we go to change its value. Ensure that it hasn't strayed
> outside the array, and have svc_rqst_replace_page return -EIO without
> changing anything if it has.
>
> Fix up nfsd_splice_actor (the only caller) to handle this case by either
> returning an error or a short splice when this happens.

IMO it's not worth the extra complexity to return a short splice.
This is a "should never happen" scenario in a hot I/O path. Let's
keep this code as simple as possible, and use unlikely() for the
error cases in both nfsd_splice_actor and svc_rqst_replace_page().

Also, since "nfsd_splice_actor ... [is] the only caller", a WARN_ON
stack trace is not adding value. I still think a tracepoint is more
appropriate. I suggest:

trace_svc_replace_page_err(rqst);


> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfsd/vfs.c | 23 +++++++++++++++++------
> include/linux/sunrpc/svc.h | 2 +-
> net/sunrpc/svc.c | 14 +++++++++++++-
> 3 files changed, 31 insertions(+), 8 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 97b38b47c563..0ebd7a65a9f0 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -939,6 +939,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> struct page *page = buf->page; // may be a compound one
> unsigned offset = buf->offset;
> struct page *last_page;
> + int ret = 0, consumed = 0;
>
> last_page = page + (offset + sd->len - 1) / PAGE_SIZE;
> for (page += offset / PAGE_SIZE; page <= last_page; page++) {
> @@ -946,13 +947,23 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> * Skip page replacement when extending the contents
> * of the current page.
> */
> - if (page != *(rqstp->rq_next_page - 1))
> - svc_rqst_replace_page(rqstp, page);
> + if (page != *(rqstp->rq_next_page - 1)) {
> + ret = svc_rqst_replace_page(rqstp, page);
> + if (ret)
> + break;
> + }
> + consumed += min_t(int,
> + PAGE_SIZE - offset_in_page(offset),
> + sd->len - consumed);
> + offset = 0;
> }
> - if (rqstp->rq_res.page_len == 0) // first call
> - rqstp->rq_res.page_base = offset % PAGE_SIZE;
> - rqstp->rq_res.page_len += sd->len;
> - return sd->len;
> + if (consumed) {
> + if (rqstp->rq_res.page_len == 0) // first call
> + rqstp->rq_res.page_base = offset % PAGE_SIZE;
> + rqstp->rq_res.page_len += consumed;
> + return consumed;
> + }
> + return ret;
> }
>
> static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe,
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index 877891536c2f..9ea52f143f49 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -422,7 +422,7 @@ struct svc_serv *svc_create(struct svc_program *, unsigned int,
> int (*threadfn)(void *data));
> struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv,
> struct svc_pool *pool, int node);
> -void svc_rqst_replace_page(struct svc_rqst *rqstp,
> +int svc_rqst_replace_page(struct svc_rqst *rqstp,
> struct page *page);
> void svc_rqst_free(struct svc_rqst *);
> void svc_exit_thread(struct svc_rqst *);
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index fea7ce8fba14..d624c02f09be 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -843,8 +843,19 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads);
> * When replacing a page in rq_pages, batch the release of the
> * replaced pages to avoid hammering the page allocator.
> */
> -void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
> +int svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
> {
> + struct page **begin, **end;
> +
> + /*
> + * Bounds check: make sure rq_next_page points into the rq_respages
> + * part of the array.
> + */
> + begin = rqstp->rq_pages;
> + end = &rqstp->rq_pages[RPCSVC_MAXPAGES];
> + if (WARN_ON_ONCE(rqstp->rq_next_page < begin || rqstp->rq_next_page > end))
> + return -EIO;
> +
> if (*rqstp->rq_next_page) {
> if (!pagevec_space(&rqstp->rq_pvec))
> __pagevec_release(&rqstp->rq_pvec);
> @@ -853,6 +864,7 @@ void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
>
> get_page(page);
> *(rqstp->rq_next_page++) = page;
> + return 0;
> }
> EXPORT_SYMBOL_GPL(svc_rqst_replace_page);
>
> --
> 2.39.2
>

--
Chuck Lever



2023-03-17 18:04:47

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] sunrpc: add bounds checking to svc_rqst_replace_page

On Fri, 2023-03-17 at 17:51 +0000, Chuck Lever III wrote:
> > On Mar 17, 2023, at 1:13 PM, Jeff Layton <[email protected]> wrote:
> >
> > If rq_next_page ends up pointing outside the array, then we can
> > corrupt
> > memory when we go to change its value. Ensure that it hasn't strayed
> > outside the array, and have svc_rqst_replace_page return -EIO
> > without
> > changing anything if it has.
> >
> > Fix up nfsd_splice_actor (the only caller) to handle this case by
> > either
> > returning an error or a short splice when this happens.
>
> IMO it's not worth the extra complexity to return a short splice.
> This is a "should never happen" scenario in a hot I/O path. Let's
> keep this code as simple as possible, and use unlikely() for the
> error cases in both nfsd_splice_actor and svc_rqst_replace_page().
>

Are there any issues with just returning an error even though we have
successfully spliced some of the data into the buffer? I guess the
caller will just see an EIO or whatever instead of the short read in
that case?


> Also, since "nfsd_splice_actor ... [is] the only caller", a WARN_ON
> stack trace is not adding value. I still think a tracepoint is more
> appropriate. I suggest:
>
> ??trace_svc_replace_page_err(rqst);
>
>

Ok, I can look at adding a conditional tracepoint.

> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/nfsd/vfs.c | 23 +++++++++++++++++------
> > include/linux/sunrpc/svc.h | 2 +-
> > net/sunrpc/svc.c | 14 +++++++++++++-
> > 3 files changed, 31 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 97b38b47c563..0ebd7a65a9f0 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -939,6 +939,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe,
> > struct pipe_buffer *buf,
> > struct page *page = buf->page; // may be a compound one
> > unsigned offset = buf->offset;
> > struct page *last_page;
> > + int ret = 0, consumed = 0;
> >
> > last_page = page + (offset + sd->len - 1) / PAGE_SIZE;
> > for (page += offset / PAGE_SIZE; page <= last_page; page++)
> > {
> > @@ -946,13 +947,23 @@ nfsd_splice_actor(struct pipe_inode_info
> > *pipe, struct pipe_buffer *buf,
> > * Skip page replacement when extending the
> > contents
> > * of the current page.
> > */
> > - if (page != *(rqstp->rq_next_page - 1))
> > - svc_rqst_replace_page(rqstp, page);
> > + if (page != *(rqstp->rq_next_page - 1)) {
> > + ret = svc_rqst_replace_page(rqstp, page);
> > + if (ret)
> > + break;
> > + }
> > + consumed += min_t(int,
> > + PAGE_SIZE -
> > offset_in_page(offset),
> > + sd->len - consumed);
> > + offset = 0;
> > }
> > - if (rqstp->rq_res.page_len == 0) // first call
> > - rqstp->rq_res.page_base = offset % PAGE_SIZE;
> > - rqstp->rq_res.page_len += sd->len;
> > - return sd->len;
> > + if (consumed) {
> > + if (rqstp->rq_res.page_len == 0) // first
> > call
> > + rqstp->rq_res.page_base = offset %
> > PAGE_SIZE;
> > + rqstp->rq_res.page_len += consumed;
> > + return consumed;
> > + }
> > + return ret;
> > }
> >
> > static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe,
> > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> > index 877891536c2f..9ea52f143f49 100644
> > --- a/include/linux/sunrpc/svc.h
> > +++ b/include/linux/sunrpc/svc.h
> > @@ -422,7 +422,7 @@ struct svc_serv *svc_create(struct svc_program
> > *, unsigned int,
> > int (*threadfn)(void *data));
> > struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv,
> > struct svc_pool *pool, int
> > node);
> > -void svc_rqst_replace_page(struct svc_rqst *rqstp,
> > +int svc_rqst_replace_page(struct svc_rqst *rqstp,
> > struct page *page);
> > void svc_rqst_free(struct svc_rqst *);
> > void svc_exit_thread(struct svc_rqst *);
> > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> > index fea7ce8fba14..d624c02f09be 100644
> > --- a/net/sunrpc/svc.c
> > +++ b/net/sunrpc/svc.c
> > @@ -843,8 +843,19 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads);
> > ?* When replacing a page in rq_pages, batch the release of the
> > ?* replaced pages to avoid hammering the page allocator.
> > ?*/
> > -void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page
> > *page)
> > +int svc_rqst_replace_page(struct svc_rqst *rqstp, struct page
> > *page)
> > {
> > + struct page **begin, **end;
> > +
> > + /*
> > + * Bounds check: make sure rq_next_page points into the
> > rq_respages
> > + * part of the array.
> > + */
> > + begin = rqstp->rq_pages;
> > + end = &rqstp->rq_pages[RPCSVC_MAXPAGES];
> > + if (WARN_ON_ONCE(rqstp->rq_next_page < begin || rqstp-
> > >rq_next_page > end))
> > + return -EIO;
> > +
> > if (*rqstp->rq_next_page) {
> > if (!pagevec_space(&rqstp->rq_pvec))
> > __pagevec_release(&rqstp->rq_pvec);
> > @@ -853,6 +864,7 @@ void svc_rqst_replace_page(struct svc_rqst
> > *rqstp, struct page *page)
> >
> > get_page(page);
> > *(rqstp->rq_next_page++) = page;
> > + return 0;
> > }
> > EXPORT_SYMBOL_GPL(svc_rqst_replace_page);
> >
> > --
> > 2.39.2
> >
>
> --
> Chuck Lever
>
>

--
Jeff Layton <[email protected]>

2023-03-17 18:08:40

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] sunrpc: add bounds checking to svc_rqst_replace_page



> On Mar 17, 2023, at 2:04 PM, Jeff Layton <[email protected]> wrote:
>
> On Fri, 2023-03-17 at 17:51 +0000, Chuck Lever III wrote:
>>> On Mar 17, 2023, at 1:13 PM, Jeff Layton <[email protected]> wrote:
>>>
>>> If rq_next_page ends up pointing outside the array, then we can
>>> corrupt
>>> memory when we go to change its value. Ensure that it hasn't strayed
>>> outside the array, and have svc_rqst_replace_page return -EIO
>>> without
>>> changing anything if it has.
>>>
>>> Fix up nfsd_splice_actor (the only caller) to handle this case by
>>> either
>>> returning an error or a short splice when this happens.
>>
>> IMO it's not worth the extra complexity to return a short splice.
>> This is a "should never happen" scenario in a hot I/O path. Let's
>> keep this code as simple as possible, and use unlikely() for the
>> error cases in both nfsd_splice_actor and svc_rqst_replace_page().
>>
>
> Are there any issues with just returning an error even though we have
> successfully spliced some of the data into the buffer? I guess the
> caller will just see an EIO or whatever instead of the short read in
> that case?

NFSv4 READ is probably going to truncate the XDR buffer. I'm not
sure NFSv3 is so clever, so you should test it.


>> Also, since "nfsd_splice_actor ... [is] the only caller", a WARN_ON
>> stack trace is not adding value. I still think a tracepoint is more
>> appropriate. I suggest:
>>
>> trace_svc_replace_page_err(rqst);
>
> Ok, I can look at adding a conditional tracepoint.

I thought about that: it doesn't help much, since you have to
explicitly test anyway to see whether to return an error.


>>> Signed-off-by: Jeff Layton <[email protected]>
>>> ---
>>> fs/nfsd/vfs.c | 23 +++++++++++++++++------
>>> include/linux/sunrpc/svc.h | 2 +-
>>> net/sunrpc/svc.c | 14 +++++++++++++-
>>> 3 files changed, 31 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>> index 97b38b47c563..0ebd7a65a9f0 100644
>>> --- a/fs/nfsd/vfs.c
>>> +++ b/fs/nfsd/vfs.c
>>> @@ -939,6 +939,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe,
>>> struct pipe_buffer *buf,
>>> struct page *page = buf->page; // may be a compound one
>>> unsigned offset = buf->offset;
>>> struct page *last_page;
>>> + int ret = 0, consumed = 0;
>>>
>>> last_page = page + (offset + sd->len - 1) / PAGE_SIZE;
>>> for (page += offset / PAGE_SIZE; page <= last_page; page++)
>>> {
>>> @@ -946,13 +947,23 @@ nfsd_splice_actor(struct pipe_inode_info
>>> *pipe, struct pipe_buffer *buf,
>>> * Skip page replacement when extending the
>>> contents
>>> * of the current page.
>>> */
>>> - if (page != *(rqstp->rq_next_page - 1))
>>> - svc_rqst_replace_page(rqstp, page);
>>> + if (page != *(rqstp->rq_next_page - 1)) {
>>> + ret = svc_rqst_replace_page(rqstp, page);
>>> + if (ret)
>>> + break;
>>> + }
>>> + consumed += min_t(int,
>>> + PAGE_SIZE -
>>> offset_in_page(offset),
>>> + sd->len - consumed);
>>> + offset = 0;
>>> }
>>> - if (rqstp->rq_res.page_len == 0) // first call
>>> - rqstp->rq_res.page_base = offset % PAGE_SIZE;
>>> - rqstp->rq_res.page_len += sd->len;
>>> - return sd->len;
>>> + if (consumed) {
>>> + if (rqstp->rq_res.page_len == 0) // first
>>> call
>>> + rqstp->rq_res.page_base = offset %
>>> PAGE_SIZE;
>>> + rqstp->rq_res.page_len += consumed;
>>> + return consumed;
>>> + }
>>> + return ret;
>>> }
>>>
>>> static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe,
>>> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
>>> index 877891536c2f..9ea52f143f49 100644
>>> --- a/include/linux/sunrpc/svc.h
>>> +++ b/include/linux/sunrpc/svc.h
>>> @@ -422,7 +422,7 @@ struct svc_serv *svc_create(struct svc_program
>>> *, unsigned int,
>>> int (*threadfn)(void *data));
>>> struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv,
>>> struct svc_pool *pool, int
>>> node);
>>> -void svc_rqst_replace_page(struct svc_rqst *rqstp,
>>> +int svc_rqst_replace_page(struct svc_rqst *rqstp,
>>> struct page *page);
>>> void svc_rqst_free(struct svc_rqst *);
>>> void svc_exit_thread(struct svc_rqst *);
>>> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
>>> index fea7ce8fba14..d624c02f09be 100644
>>> --- a/net/sunrpc/svc.c
>>> +++ b/net/sunrpc/svc.c
>>> @@ -843,8 +843,19 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads);
>>> * When replacing a page in rq_pages, batch the release of the
>>> * replaced pages to avoid hammering the page allocator.
>>> */
>>> -void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page
>>> *page)
>>> +int svc_rqst_replace_page(struct svc_rqst *rqstp, struct page
>>> *page)
>>> {
>>> + struct page **begin, **end;
>>> +
>>> + /*
>>> + * Bounds check: make sure rq_next_page points into the
>>> rq_respages
>>> + * part of the array.
>>> + */
>>> + begin = rqstp->rq_pages;
>>> + end = &rqstp->rq_pages[RPCSVC_MAXPAGES];
>>> + if (WARN_ON_ONCE(rqstp->rq_next_page < begin || rqstp-
>>>> rq_next_page > end))
>>> + return -EIO;
>>> +
>>> if (*rqstp->rq_next_page) {
>>> if (!pagevec_space(&rqstp->rq_pvec))
>>> __pagevec_release(&rqstp->rq_pvec);
>>> @@ -853,6 +864,7 @@ void svc_rqst_replace_page(struct svc_rqst
>>> *rqstp, struct page *page)
>>>
>>> get_page(page);
>>> *(rqstp->rq_next_page++) = page;
>>> + return 0;
>>> }
>>> EXPORT_SYMBOL_GPL(svc_rqst_replace_page);
>>>
>>> --
>>> 2.39.2
>>>
>>
>> --
>> Chuck Lever
>>
>>
>
> --
> Jeff Layton <[email protected]>

--
Chuck Lever



2023-03-17 18:32:52

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] sunrpc: add bounds checking to svc_rqst_replace_page



> On Mar 17, 2023, at 1:13 PM, Jeff Layton <[email protected]> wrote:
>
> If rq_next_page ends up pointing outside the array, then we can corrupt
> memory when we go to change its value. Ensure that it hasn't strayed
> outside the array, and have svc_rqst_replace_page return -EIO without
> changing anything if it has.
>
> Fix up nfsd_splice_actor (the only caller) to handle this case by either
> returning an error or a short splice when this happens.
>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfsd/vfs.c | 23 +++++++++++++++++------
> include/linux/sunrpc/svc.h | 2 +-
> net/sunrpc/svc.c | 14 +++++++++++++-
> 3 files changed, 31 insertions(+), 8 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 97b38b47c563..0ebd7a65a9f0 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -939,6 +939,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> struct page *page = buf->page; // may be a compound one
> unsigned offset = buf->offset;
> struct page *last_page;
> + int ret = 0, consumed = 0;
>
> last_page = page + (offset + sd->len - 1) / PAGE_SIZE;
> for (page += offset / PAGE_SIZE; page <= last_page; page++) {
> @@ -946,13 +947,23 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> * Skip page replacement when extending the contents
> * of the current page.
> */
> - if (page != *(rqstp->rq_next_page - 1))
> - svc_rqst_replace_page(rqstp, page);
> + if (page != *(rqstp->rq_next_page - 1)) {
> + ret = svc_rqst_replace_page(rqstp, page);
> + if (ret)
> + break;
> + }
> + consumed += min_t(int,
> + PAGE_SIZE - offset_in_page(offset),
> + sd->len - consumed);
> + offset = 0;
> }
> - if (rqstp->rq_res.page_len == 0) // first call
> - rqstp->rq_res.page_base = offset % PAGE_SIZE;
> - rqstp->rq_res.page_len += sd->len;
> - return sd->len;
> + if (consumed) {
> + if (rqstp->rq_res.page_len == 0) // first call
> + rqstp->rq_res.page_base = offset % PAGE_SIZE;
> + rqstp->rq_res.page_len += consumed;
> + return consumed;
> + }
> + return ret;
> }
>
> static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe,
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index 877891536c2f..9ea52f143f49 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -422,7 +422,7 @@ struct svc_serv *svc_create(struct svc_program *, unsigned int,
> int (*threadfn)(void *data));
> struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv,
> struct svc_pool *pool, int node);
> -void svc_rqst_replace_page(struct svc_rqst *rqstp,
> +int svc_rqst_replace_page(struct svc_rqst *rqstp,
> struct page *page);
> void svc_rqst_free(struct svc_rqst *);
> void svc_exit_thread(struct svc_rqst *);
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index fea7ce8fba14..d624c02f09be 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -843,8 +843,19 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads);
> * When replacing a page in rq_pages, batch the release of the
> * replaced pages to avoid hammering the page allocator.
> */
> -void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
> +int svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
> {
> + struct page **begin, **end;
> +
> + /*
> + * Bounds check: make sure rq_next_page points into the rq_respages
> + * part of the array.
> + */
> + begin = rqstp->rq_pages;
> + end = &rqstp->rq_pages[RPCSVC_MAXPAGES];

Would it make sense to use rq_page_end here? Just a thought.


> + if (WARN_ON_ONCE(rqstp->rq_next_page < begin || rqstp->rq_next_page > end))
> + return -EIO;
> +
> if (*rqstp->rq_next_page) {
> if (!pagevec_space(&rqstp->rq_pvec))
> __pagevec_release(&rqstp->rq_pvec);
> @@ -853,6 +864,7 @@ void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
>
> get_page(page);
> *(rqstp->rq_next_page++) = page;
> + return 0;
> }
> EXPORT_SYMBOL_GPL(svc_rqst_replace_page);
>
> --
> 2.39.2
>

--
Chuck Lever



2023-03-17 18:52:48

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] sunrpc: add bounds checking to svc_rqst_replace_page

On Fri, 2023-03-17 at 18:32 +0000, Chuck Lever III wrote:
>
> > On Mar 17, 2023, at 1:13 PM, Jeff Layton <[email protected]> wrote:
> >
> > If rq_next_page ends up pointing outside the array, then we can corrupt
> > memory when we go to change its value. Ensure that it hasn't strayed
> > outside the array, and have svc_rqst_replace_page return -EIO without
> > changing anything if it has.
> >
> > Fix up nfsd_splice_actor (the only caller) to handle this case by either
> > returning an error or a short splice when this happens.
> >
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/nfsd/vfs.c | 23 +++++++++++++++++------
> > include/linux/sunrpc/svc.h | 2 +-
> > net/sunrpc/svc.c | 14 +++++++++++++-
> > 3 files changed, 31 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 97b38b47c563..0ebd7a65a9f0 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -939,6 +939,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> > struct page *page = buf->page; // may be a compound one
> > unsigned offset = buf->offset;
> > struct page *last_page;
> > + int ret = 0, consumed = 0;
> >
> > last_page = page + (offset + sd->len - 1) / PAGE_SIZE;
> > for (page += offset / PAGE_SIZE; page <= last_page; page++) {
> > @@ -946,13 +947,23 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> > * Skip page replacement when extending the contents
> > * of the current page.
> > */
> > - if (page != *(rqstp->rq_next_page - 1))
> > - svc_rqst_replace_page(rqstp, page);
> > + if (page != *(rqstp->rq_next_page - 1)) {
> > + ret = svc_rqst_replace_page(rqstp, page);
> > + if (ret)
> > + break;
> > + }
> > + consumed += min_t(int,
> > + PAGE_SIZE - offset_in_page(offset),
> > + sd->len - consumed);
> > + offset = 0;
> > }
> > - if (rqstp->rq_res.page_len == 0) // first call
> > - rqstp->rq_res.page_base = offset % PAGE_SIZE;
> > - rqstp->rq_res.page_len += sd->len;
> > - return sd->len;
> > + if (consumed) {
> > + if (rqstp->rq_res.page_len == 0) // first call
> > + rqstp->rq_res.page_base = offset % PAGE_SIZE;
> > + rqstp->rq_res.page_len += consumed;
> > + return consumed;
> > + }
> > + return ret;
> > }
> >
> > static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe,
> > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> > index 877891536c2f..9ea52f143f49 100644
> > --- a/include/linux/sunrpc/svc.h
> > +++ b/include/linux/sunrpc/svc.h
> > @@ -422,7 +422,7 @@ struct svc_serv *svc_create(struct svc_program *, unsigned int,
> > int (*threadfn)(void *data));
> > struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv,
> > struct svc_pool *pool, int node);
> > -void svc_rqst_replace_page(struct svc_rqst *rqstp,
> > +int svc_rqst_replace_page(struct svc_rqst *rqstp,
> > struct page *page);
> > void svc_rqst_free(struct svc_rqst *);
> > void svc_exit_thread(struct svc_rqst *);
> > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> > index fea7ce8fba14..d624c02f09be 100644
> > --- a/net/sunrpc/svc.c
> > +++ b/net/sunrpc/svc.c
> > @@ -843,8 +843,19 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads);
> > * When replacing a page in rq_pages, batch the release of the
> > * replaced pages to avoid hammering the page allocator.
> > */
> > -void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
> > +int svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
> > {
> > + struct page **begin, **end;
> > +
> > + /*
> > + * Bounds check: make sure rq_next_page points into the rq_respages
> > + * part of the array.
> > + */
> > + begin = rqstp->rq_pages;
> > + end = &rqstp->rq_pages[RPCSVC_MAXPAGES];
>
> Would it make sense to use rq_page_end here? Just a thought.
>

That should be set to the right value, but do you trust that it wouldn't
be corrupt if rq_pages got overrun? It's only 3 pointers beyond the end
of the array. This should work even if rq_page_end gets clobbered.

>
> > + if (WARN_ON_ONCE(rqstp->rq_next_page < begin || rqstp->rq_next_page > end))
> > + return -EIO;
> > +
> > if (*rqstp->rq_next_page) {
> > if (!pagevec_space(&rqstp->rq_pvec))
> > __pagevec_release(&rqstp->rq_pvec);
> > @@ -853,6 +864,7 @@ void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
> >
> > get_page(page);
> > *(rqstp->rq_next_page++) = page;
> > + return 0;
> > }
> > EXPORT_SYMBOL_GPL(svc_rqst_replace_page);
> >
> > --
> > 2.39.2
> >
>
> --
> Chuck Lever
>
>

--
Jeff Layton <[email protected]>

2023-03-17 19:00:11

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] sunrpc: add bounds checking to svc_rqst_replace_page

On Fri, 2023-03-17 at 18:08 +0000, Chuck Lever III wrote:
>
> > On Mar 17, 2023, at 2:04 PM, Jeff Layton <[email protected]> wrote:
> >
> > On Fri, 2023-03-17 at 17:51 +0000, Chuck Lever III wrote:
> > > > On Mar 17, 2023, at 1:13 PM, Jeff Layton <[email protected]> wrote:
> > > >
> > > > If rq_next_page ends up pointing outside the array, then we can
> > > > corrupt
> > > > memory when we go to change its value. Ensure that it hasn't strayed
> > > > outside the array, and have svc_rqst_replace_page return -EIO
> > > > without
> > > > changing anything if it has.
> > > >
> > > > Fix up nfsd_splice_actor (the only caller) to handle this case by
> > > > either
> > > > returning an error or a short splice when this happens.
> > >
> > > IMO it's not worth the extra complexity to return a short splice.
> > > This is a "should never happen" scenario in a hot I/O path. Let's
> > > keep this code as simple as possible, and use unlikely() for the
> > > error cases in both nfsd_splice_actor and svc_rqst_replace_page().
> > >
> >
> > Are there any issues with just returning an error even though we have
> > successfully spliced some of the data into the buffer? I guess the
> > caller will just see an EIO or whatever instead of the short read in
> > that case?
>
> NFSv4 READ is probably going to truncate the XDR buffer. I'm not
> sure NFSv3 is so clever, so you should test it.
>

Honestly, I don't have the cycles to do that sort of fault injection
testing for this. If you think handling it as a short read is overblown,
then tell me what you would like see here.

> > > Also, since "nfsd_splice_actor ... [is] the only caller", a WARN_ON
> > > stack trace is not adding value. I still think a tracepoint is more
> > > appropriate. I suggest:
> > >
> > > trace_svc_replace_page_err(rqst);
> >
> > Ok, I can look at adding a conditional tracepoint.
>
> I thought about that: it doesn't help much, since you have to
> explicitly test anyway to see whether to return an error.
>

Fair point. I can just make it a regular tracepoint that fires inside
the conditional.

>
> > > > Signed-off-by: Jeff Layton <[email protected]>
> > > > ---
> > > > fs/nfsd/vfs.c | 23 +++++++++++++++++------
> > > > include/linux/sunrpc/svc.h | 2 +-
> > > > net/sunrpc/svc.c | 14 +++++++++++++-
> > > > 3 files changed, 31 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > > > index 97b38b47c563..0ebd7a65a9f0 100644
> > > > --- a/fs/nfsd/vfs.c
> > > > +++ b/fs/nfsd/vfs.c
> > > > @@ -939,6 +939,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe,
> > > > struct pipe_buffer *buf,
> > > > struct page *page = buf->page; // may be a compound one
> > > > unsigned offset = buf->offset;
> > > > struct page *last_page;
> > > > + int ret = 0, consumed = 0;
> > > >
> > > > last_page = page + (offset + sd->len - 1) / PAGE_SIZE;
> > > > for (page += offset / PAGE_SIZE; page <= last_page; page++)
> > > > {
> > > > @@ -946,13 +947,23 @@ nfsd_splice_actor(struct pipe_inode_info
> > > > *pipe, struct pipe_buffer *buf,
> > > > * Skip page replacement when extending the
> > > > contents
> > > > * of the current page.
> > > > */
> > > > - if (page != *(rqstp->rq_next_page - 1))
> > > > - svc_rqst_replace_page(rqstp, page);
> > > > + if (page != *(rqstp->rq_next_page - 1)) {
> > > > + ret = svc_rqst_replace_page(rqstp, page);
> > > > + if (ret)
> > > > + break;
> > > > + }
> > > > + consumed += min_t(int,
> > > > + PAGE_SIZE -
> > > > offset_in_page(offset),
> > > > + sd->len - consumed);
> > > > + offset = 0;
> > > > }
> > > > - if (rqstp->rq_res.page_len == 0) // first call
> > > > - rqstp->rq_res.page_base = offset % PAGE_SIZE;
> > > > - rqstp->rq_res.page_len += sd->len;
> > > > - return sd->len;
> > > > + if (consumed) {
> > > > + if (rqstp->rq_res.page_len == 0) // first
> > > > call
> > > > + rqstp->rq_res.page_base = offset %
> > > > PAGE_SIZE;
> > > > + rqstp->rq_res.page_len += consumed;
> > > > + return consumed;
> > > > + }
> > > > + return ret;
> > > > }
> > > >
> > > > static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe,
> > > > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> > > > index 877891536c2f..9ea52f143f49 100644
> > > > --- a/include/linux/sunrpc/svc.h
> > > > +++ b/include/linux/sunrpc/svc.h
> > > > @@ -422,7 +422,7 @@ struct svc_serv *svc_create(struct svc_program
> > > > *, unsigned int,
> > > > int (*threadfn)(void *data));
> > > > struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv,
> > > > struct svc_pool *pool, int
> > > > node);
> > > > -void svc_rqst_replace_page(struct svc_rqst *rqstp,
> > > > +int svc_rqst_replace_page(struct svc_rqst *rqstp,
> > > > struct page *page);
> > > > void svc_rqst_free(struct svc_rqst *);
> > > > void svc_exit_thread(struct svc_rqst *);
> > > > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> > > > index fea7ce8fba14..d624c02f09be 100644
> > > > --- a/net/sunrpc/svc.c
> > > > +++ b/net/sunrpc/svc.c
> > > > @@ -843,8 +843,19 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads);
> > > > * When replacing a page in rq_pages, batch the release of the
> > > > * replaced pages to avoid hammering the page allocator.
> > > > */
> > > > -void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page
> > > > *page)
> > > > +int svc_rqst_replace_page(struct svc_rqst *rqstp, struct page
> > > > *page)
> > > > {
> > > > + struct page **begin, **end;
> > > > +
> > > > + /*
> > > > + * Bounds check: make sure rq_next_page points into the
> > > > rq_respages
> > > > + * part of the array.
> > > > + */
> > > > + begin = rqstp->rq_pages;
> > > > + end = &rqstp->rq_pages[RPCSVC_MAXPAGES];
> > > > + if (WARN_ON_ONCE(rqstp->rq_next_page < begin || rqstp-
> > > > > rq_next_page > end))
> > > > + return -EIO;
> > > > +
> > > > if (*rqstp->rq_next_page) {
> > > > if (!pagevec_space(&rqstp->rq_pvec))
> > > > __pagevec_release(&rqstp->rq_pvec);
> > > > @@ -853,6 +864,7 @@ void svc_rqst_replace_page(struct svc_rqst
> > > > *rqstp, struct page *page)
> > > >
> > > > get_page(page);
> > > > *(rqstp->rq_next_page++) = page;
> > > > + return 0;
> > > > }
> > > > EXPORT_SYMBOL_GPL(svc_rqst_replace_page);
> > > >
> > > > --
> > > > 2.39.2
> > > >
> > >
> > > --
> > > Chuck Lever
> > >
> > >
> >
> > --
> > Jeff Layton <[email protected]>
>
> --
> Chuck Lever
>
>

--
Jeff Layton <[email protected]>

2023-03-17 20:55:55

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] sunrpc: add bounds checking to svc_rqst_replace_page



> On Mar 17, 2023, at 2:59 PM, Jeff Layton <[email protected]> wrote:
>
> On Fri, 2023-03-17 at 18:08 +0000, Chuck Lever III wrote:
>>
>>> On Mar 17, 2023, at 2:04 PM, Jeff Layton <[email protected]> wrote:
>>>
>>> On Fri, 2023-03-17 at 17:51 +0000, Chuck Lever III wrote:
>>>>> On Mar 17, 2023, at 1:13 PM, Jeff Layton <[email protected]> wrote:
>>>>>
>>>>> If rq_next_page ends up pointing outside the array, then we can
>>>>> corrupt
>>>>> memory when we go to change its value. Ensure that it hasn't strayed
>>>>> outside the array, and have svc_rqst_replace_page return -EIO
>>>>> without
>>>>> changing anything if it has.
>>>>>
>>>>> Fix up nfsd_splice_actor (the only caller) to handle this case by
>>>>> either
>>>>> returning an error or a short splice when this happens.
>>>>
>>>> IMO it's not worth the extra complexity to return a short splice.
>>>> This is a "should never happen" scenario in a hot I/O path. Let's
>>>> keep this code as simple as possible, and use unlikely() for the
>>>> error cases in both nfsd_splice_actor and svc_rqst_replace_page().
>>>>
>>>
>>> Are there any issues with just returning an error even though we have
>>> successfully spliced some of the data into the buffer? I guess the
>>> caller will just see an EIO or whatever instead of the short read in
>>> that case?
>>
>> NFSv4 READ is probably going to truncate the XDR buffer. I'm not
>> sure NFSv3 is so clever, so you should test it.
>
> Honestly, I don't have the cycles to do that sort of fault injection
> testing for this.

nfsd_splice_actor() has never returned an error, so IMO it is
necessary to confirm that when svc_rqst_replace_page() returns
an error, it doesn't create further problems. I don't see how
we can avoid some kind of simple fault injection while developing
the fix.

Tell you what, I can take it from here if you'd like.


> If you think handling it as a short read is overblown,
> then tell me what you would like see here.

It's not the short reads that bugs me, it's the additional
code in a hot path that is worrisome.

--
Chuck Lever



2023-03-17 22:10:24

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] sunrpc: add bounds checking to svc_rqst_replace_page

On Fri, 2023-03-17 at 20:55 +0000, Chuck Lever III wrote:
>
> > On Mar 17, 2023, at 2:59 PM, Jeff Layton <[email protected]> wrote:
> >
> > On Fri, 2023-03-17 at 18:08 +0000, Chuck Lever III wrote:
> > >
> > > > On Mar 17, 2023, at 2:04 PM, Jeff Layton <[email protected]> wrote:
> > > >
> > > > On Fri, 2023-03-17 at 17:51 +0000, Chuck Lever III wrote:
> > > > > > On Mar 17, 2023, at 1:13 PM, Jeff Layton <[email protected]> wrote:
> > > > > >
> > > > > > If rq_next_page ends up pointing outside the array, then we can
> > > > > > corrupt
> > > > > > memory when we go to change its value. Ensure that it hasn't strayed
> > > > > > outside the array, and have svc_rqst_replace_page return -EIO
> > > > > > without
> > > > > > changing anything if it has.
> > > > > >
> > > > > > Fix up nfsd_splice_actor (the only caller) to handle this case by
> > > > > > either
> > > > > > returning an error or a short splice when this happens.
> > > > >
> > > > > IMO it's not worth the extra complexity to return a short splice.
> > > > > This is a "should never happen" scenario in a hot I/O path. Let's
> > > > > keep this code as simple as possible, and use unlikely() for the
> > > > > error cases in both nfsd_splice_actor and svc_rqst_replace_page().
> > > > >
> > > >
> > > > Are there any issues with just returning an error even though we have
> > > > successfully spliced some of the data into the buffer? I guess the
> > > > caller will just see an EIO or whatever instead of the short read in
> > > > that case?
> > >
> > > NFSv4 READ is probably going to truncate the XDR buffer. I'm not
> > > sure NFSv3 is so clever, so you should test it.
> >
> > Honestly, I don't have the cycles to do that sort of fault injection
> > testing for this.
>
> nfsd_splice_actor() has never returned an error, so IMO it is
> necessary to confirm that when svc_rqst_replace_page() returns
> an error, it doesn't create further problems. I don't see how
> we can avoid some kind of simple fault injection while developing
> the fix.
>
> Tell you what, I can take it from here if you'd like.
>

Sure! All yours!

>
> > If you think handling it as a short read is overblown,
> > then tell me what you would like see here.
>
> It's not the short reads that bugs me, it's the additional
> code in a hot path that is worrisome.
>

I wouldn't think tracking some extra stuff on the stack would show up
much in profiles, but it's your call.

--
Jeff Layton <[email protected]>