2019-03-29 14:23:54

by Murphy Zhou

[permalink] [raw]
Subject: [PATCH] nfsd/nfsd3_proc_readdir: check before encoding on temporary page

After this commit
f875a79 nfsd: allow nfsv3 readdir request to be larger.
nfsv3 readdir request size can be larger than PAGE_SIZE. So if the
request and the directory are large enough, we can run out of pages
in rq_respages. Then the temporary page we are encoding on is a
random page, Oops happen.

Listing a directory with 30000 files in it can trigger the panic.

Fixing this by ensuring the temporary page resides in rq_respages.
And when counting how many bytes left currently from buffer to the
end of the page which buffer is pointing to, aka len1, do not
assume that curr_page_addr is at the beginning of the same page.
Also, update resp->count by going through all pages. Because the
pages may not be sequential, the old way is not safe.

Fixes: f875a79 "nfsd: allow nfsv3 readdir request to be larger"
Signed-off-by: Murphy Zhou <[email protected]>
---
fs/nfsd/nfs3proc.c | 17 +++++++++++++++--
fs/nfsd/nfs3xdr.c | 8 +++++---
2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index 8f933e84cec1..9bc32af4e2da 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -442,7 +442,9 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp)
struct nfsd3_readdirargs *argp = rqstp->rq_argp;
struct nfsd3_readdirres *resp = rqstp->rq_resp;
__be32 nfserr;
- int count;
+ int count = 0;
+ struct page **p;
+ caddr_t page_addr = NULL;

dprintk("nfsd: READDIR(3) %s %d bytes at %d\n",
SVCFH_fmt(&argp->fh),
@@ -462,7 +464,18 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp)
nfserr = nfsd_readdir(rqstp, &resp->fh, (loff_t*) &argp->cookie,
&resp->common, nfs3svc_encode_entry);
memcpy(resp->verf, argp->verf, 8);
- resp->count = resp->buffer - argp->buffer;
+ count = 0;
+ for (p = rqstp->rq_respages + 1; p < rqstp->rq_next_page; p++) {
+ page_addr = page_address(*p);
+
+ if (((caddr_t)resp->buffer >= page_addr) &&
+ ((caddr_t)resp->buffer < page_addr + PAGE_SIZE)) {
+ count += (caddr_t)resp->buffer - page_addr;
+ break;
+ }
+ count += PAGE_SIZE;
+ }
+ resp->count = count >> 2;
if (resp->offset) {
loff_t offset = argp->cookie;

diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 93fea246f676..1fabf1952bdb 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -953,6 +953,7 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
break;
}

+
if ((caddr_t)(cd->buffer + elen) < (curr_page_addr + PAGE_SIZE)) {
/* encode entry in current page */

@@ -961,11 +962,12 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
if (plus)
p = encode_entryplus_baggage(cd, p, name, namlen, ino);
num_entry_words = p - cd->buffer;
- } else if (*(page+1) != NULL) {
+ } else if (*(page+1) != NULL && (page+1) < cd->rqstp->rq_next_page) {
/* temporarily encode entry into next page, then move back to
* current and next page in rq_respages[] */
__be32 *p1, *tmp;
int len1, len2;
+ caddr_t tmp_page_addr = NULL;

/* grab next page for temporary storage of entry */
p1 = tmp = page_address(*(page+1));
@@ -977,7 +979,8 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,

/* determine entry word length and lengths to go in pages */
num_entry_words = p1 - tmp;
- len1 = curr_page_addr + PAGE_SIZE - (caddr_t)cd->buffer;
+ tmp_page_addr = (caddr_t)((unsigned long)cd->buffer & PAGE_MASK);
+ len1 = tmp_page_addr + PAGE_SIZE - (caddr_t)cd->buffer;
if ((num_entry_words << 2) < len1) {
/* the actual number of words in the entry is less
* than elen and can still fit in the current page
@@ -1026,7 +1029,6 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
cd->buffer = p;
cd->common.err = nfs_ok;
return 0;
-
}

int
--
2.21.0



2019-04-01 01:20:39

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] nfsd/nfsd3_proc_readdir: check before encoding on temporary page

On Fri, Mar 29 2019, Murphy Zhou wrote:

> After this commit
> f875a79 nfsd: allow nfsv3 readdir request to be larger.
> nfsv3 readdir request size can be larger than PAGE_SIZE. So if the
> request and the directory are large enough, we can run out of pages
> in rq_respages. Then the temporary page we are encoding on is a
> random page, Oops happen.
>
> Listing a directory with 30000 files in it can trigger the panic.
>
> Fixing this by ensuring the temporary page resides in rq_respages.
> And when counting how many bytes left currently from buffer to the
> end of the page which buffer is pointing to, aka len1, do not
> assume that curr_page_addr is at the beginning of the same page.
> Also, update resp->count by going through all pages. Because the
> pages may not be sequential, the old way is not safe.
>
> Fixes: f875a79 "nfsd: allow nfsv3 readdir request to be larger"
> Signed-off-by: Murphy Zhou <[email protected]>

Hi,
thanks for finding these problems and submitting a patch. Clearly I
should have tested better :-(

> ---
> fs/nfsd/nfs3proc.c | 17 +++++++++++++++--
> fs/nfsd/nfs3xdr.c | 8 +++++---
> 2 files changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> index 8f933e84cec1..9bc32af4e2da 100644
> --- a/fs/nfsd/nfs3proc.c
> +++ b/fs/nfsd/nfs3proc.c
> @@ -442,7 +442,9 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp)
> struct nfsd3_readdirargs *argp = rqstp->rq_argp;
> struct nfsd3_readdirres *resp = rqstp->rq_resp;
> __be32 nfserr;
> - int count;
> + int count = 0;
> + struct page **p;
> + caddr_t page_addr = NULL;
>
> dprintk("nfsd: READDIR(3) %s %d bytes at %d\n",
> SVCFH_fmt(&argp->fh),
> @@ -462,7 +464,18 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp)
> nfserr = nfsd_readdir(rqstp, &resp->fh, (loff_t*) &argp->cookie,
> &resp->common, nfs3svc_encode_entry);
> memcpy(resp->verf, argp->verf, 8);
> - resp->count = resp->buffer - argp->buffer;
> + count = 0;
> + for (p = rqstp->rq_respages + 1; p < rqstp->rq_next_page; p++) {
> + page_addr = page_address(*p);
> +
> + if (((caddr_t)resp->buffer >= page_addr) &&
> + ((caddr_t)resp->buffer < page_addr + PAGE_SIZE)) {
> + count += (caddr_t)resp->buffer - page_addr;
> + break;
> + }
> + count += PAGE_SIZE;
> + }
> + resp->count = count >> 2;
> if (resp->offset) {
> loff_t offset = argp->cookie;

This section makes perfect sense. You have copied code from
nfsd3_proc_readdirplus() int nfsd3_proc_readdir(). This is needed
because readdir doesn't limit replies to PAGE_SIZE any more.

>
> diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> index 93fea246f676..1fabf1952bdb 100644
> --- a/fs/nfsd/nfs3xdr.c
> +++ b/fs/nfsd/nfs3xdr.c
> @@ -953,6 +953,7 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
> break;
> }
>
> +

Adding blank lines is best avoided.

> if ((caddr_t)(cd->buffer + elen) < (curr_page_addr + PAGE_SIZE)) {
> /* encode entry in current page */
>
> @@ -961,11 +962,12 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
> if (plus)
> p = encode_entryplus_baggage(cd, p, name, namlen, ino);
> num_entry_words = p - cd->buffer;
> - } else if (*(page+1) != NULL) {
> + } else if (*(page+1) != NULL && (page+1) < cd->rqstp->rq_next_page) {

Adding the test against rq_next_page looks correct.
nfs3svc_decode_readdirplusargs increments ->rq_next_page over enough
pages to cover the request count.
However nfs4svc_decode_readdirargs doesn't! So it will not advance
rq_next_page properly. We need to fix that to make it like
nfs3svc_decode_readdirplusargs().

But I don't think this test should be needed. svc_alloc_arg() NULL-
terminates the ->rq_pages array which ->rq_next_page points in to.

So I think this test is correct as it stands, but
nfs4svc_decode_readdirargs() needs to be fixed.

> /* temporarily encode entry into next page, then move back to
> * current and next page in rq_respages[] */
> __be32 *p1, *tmp;
> int len1, len2;
> + caddr_t tmp_page_addr = NULL;
>
> /* grab next page for temporary storage of entry */
> p1 = tmp = page_address(*(page+1));
> @@ -977,7 +979,8 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
>
> /* determine entry word length and lengths to go in pages */
> num_entry_words = p1 - tmp;
> - len1 = curr_page_addr + PAGE_SIZE - (caddr_t)cd->buffer;
> + tmp_page_addr = (caddr_t)((unsigned long)cd->buffer & PAGE_MASK);
> + len1 = tmp_page_addr + PAGE_SIZE - (caddr_t)cd->buffer;

I think tmp_page_addr will always be the same as curr_page_addr.
At least, it will when nfs4svc_decode_readdirargs() is fixed.

Could you please revert the changed you've made to nfs3xdr.c (keeping
the change to nfsd3_proc_readdir()), and fix nfs4svc_decode_readdirargs
to be more like nfs4svc_decode_readdirplusargs, and see if that fixed
the problem?

Thanks a lot,
NeilBrown


> if ((num_entry_words << 2) < len1) {
> /* the actual number of words in the entry is less
> * than elen and can still fit in the current page
> @@ -1026,7 +1029,6 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
> cd->buffer = p;
> cd->common.err = nfs_ok;
> return 0;
> -
> }
>
> int
> --
> 2.21.0


Attachments:
signature.asc (832.00 B)

2019-04-04 02:55:59

by Murphy Zhou

[permalink] [raw]
Subject: Re: [PATCH] nfsd/nfsd3_proc_readdir: check before encoding on temporary page

Hi,

Thanks for the review!

On Mon, Apr 1, 2019 at 9:20 AM NeilBrown <[email protected]> wrote:
>
> On Fri, Mar 29 2019, Murphy Zhou wrote:
>
> > After this commit
> > f875a79 nfsd: allow nfsv3 readdir request to be larger.
> > nfsv3 readdir request size can be larger than PAGE_SIZE. So if the
> > request and the directory are large enough, we can run out of pages
> > in rq_respages. Then the temporary page we are encoding on is a
> > random page, Oops happen.
> >
> > Listing a directory with 30000 files in it can trigger the panic.
> >
> > Fixing this by ensuring the temporary page resides in rq_respages.
> > And when counting how many bytes left currently from buffer to the
> > end of the page which buffer is pointing to, aka len1, do not
> > assume that curr_page_addr is at the beginning of the same page.
> > Also, update resp->count by going through all pages. Because the
> > pages may not be sequential, the old way is not safe.
> >
> > Fixes: f875a79 "nfsd: allow nfsv3 readdir request to be larger"
> > Signed-off-by: Murphy Zhou <[email protected]>
>
> Hi,
> thanks for finding these problems and submitting a patch. Clearly I
> should have tested better :-(
>
> > ---
> > fs/nfsd/nfs3proc.c | 17 +++++++++++++++--
> > fs/nfsd/nfs3xdr.c | 8 +++++---
> > 2 files changed, 20 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> > index 8f933e84cec1..9bc32af4e2da 100644
> > --- a/fs/nfsd/nfs3proc.c
> > +++ b/fs/nfsd/nfs3proc.c
> > @@ -442,7 +442,9 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp)
> > struct nfsd3_readdirargs *argp = rqstp->rq_argp;
> > struct nfsd3_readdirres *resp = rqstp->rq_resp;
> > __be32 nfserr;
> > - int count;
> > + int count = 0;
> > + struct page **p;
> > + caddr_t page_addr = NULL;
> >
> > dprintk("nfsd: READDIR(3) %s %d bytes at %d\n",
> > SVCFH_fmt(&argp->fh),
> > @@ -462,7 +464,18 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp)
> > nfserr = nfsd_readdir(rqstp, &resp->fh, (loff_t*) &argp->cookie,
> > &resp->common, nfs3svc_encode_entry);
> > memcpy(resp->verf, argp->verf, 8);
> > - resp->count = resp->buffer - argp->buffer;
> > + count = 0;
> > + for (p = rqstp->rq_respages + 1; p < rqstp->rq_next_page; p++) {
> > + page_addr = page_address(*p);
> > +
> > + if (((caddr_t)resp->buffer >= page_addr) &&
> > + ((caddr_t)resp->buffer < page_addr + PAGE_SIZE)) {
> > + count += (caddr_t)resp->buffer - page_addr;
> > + break;
> > + }
> > + count += PAGE_SIZE;
> > + }
> > + resp->count = count >> 2;
> > if (resp->offset) {
> > loff_t offset = argp->cookie;
>
> This section makes perfect sense. You have copied code from
> nfsd3_proc_readdirplus() int nfsd3_proc_readdir(). This is needed
> because readdir doesn't limit replies to PAGE_SIZE any more.
>
> >
> > diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> > index 93fea246f676..1fabf1952bdb 100644
> > --- a/fs/nfsd/nfs3xdr.c
> > +++ b/fs/nfsd/nfs3xdr.c
> > @@ -953,6 +953,7 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
> > break;
> > }
> >
> > +
>
> Adding blank lines is best avoided.
>
> > if ((caddr_t)(cd->buffer + elen) < (curr_page_addr + PAGE_SIZE)) {
> > /* encode entry in current page */
> >
> > @@ -961,11 +962,12 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
> > if (plus)
> > p = encode_entryplus_baggage(cd, p, name, namlen, ino);
> > num_entry_words = p - cd->buffer;
> > - } else if (*(page+1) != NULL) {
> > + } else if (*(page+1) != NULL && (page+1) < cd->rqstp->rq_next_page) {
>
> Adding the test against rq_next_page looks correct.
> nfs3svc_decode_readdirplusargs increments ->rq_next_page over enough
> pages to cover the request count.
> However nfs4svc_decode_readdirargs doesn't! So it will not advance
> rq_next_page properly. We need to fix that to make it like
> nfs3svc_decode_readdirplusargs().
>
> But I don't think this test should be needed. svc_alloc_arg() NULL-
> terminates the ->rq_pages array which ->rq_next_page points in to.
>
> So I think this test is correct as it stands, but
> nfs4svc_decode_readdirargs() needs to be fixed.

Agreed. If we set the page pointers right. These checks wont be necessary.

>
> > /* temporarily encode entry into next page, then move back to
> > * current and next page in rq_respages[] */
> > __be32 *p1, *tmp;
> > int len1, len2;
> > + caddr_t tmp_page_addr = NULL;
> >
> > /* grab next page for temporary storage of entry */
> > p1 = tmp = page_address(*(page+1));
> > @@ -977,7 +979,8 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
> >
> > /* determine entry word length and lengths to go in pages */
> > num_entry_words = p1 - tmp;
> > - len1 = curr_page_addr + PAGE_SIZE - (caddr_t)cd->buffer;
> > + tmp_page_addr = (caddr_t)((unsigned long)cd->buffer & PAGE_MASK);
> > + len1 = tmp_page_addr + PAGE_SIZE - (caddr_t)cd->buffer;
>
> I think tmp_page_addr will always be the same as curr_page_addr.
> At least, it will when nfs4svc_decode_readdirargs() is fixed.
>
> Could you please revert the changed you've made to nfs3xdr.c (keeping
> the change to nfsd3_proc_readdir()), and fix nfs4svc_decode_readdirargs
> to be more like nfs4svc_decode_readdirplusargs, and see if that fixed
> the problem?

Sure. A quick test on 3000 files passed. I'll post v2 if test on
30,000 files pass.

Thanks very much for reviewing!
M

>
> Thanks a lot,
> NeilBrown
>
>
> > if ((num_entry_words << 2) < len1) {
> > /* the actual number of words in the entry is less
> > * than elen and can still fit in the current page
> > @@ -1026,7 +1029,6 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
> > cd->buffer = p;
> > cd->common.err = nfs_ok;
> > return 0;
> > -
> > }
> >
> > int
> > --
> > 2.21.0