2010-04-20 02:16:59

by NeilBrown

[permalink] [raw]
Subject: [PATCH] bug in read_buf


Surely this can never have worked... which implies that the code has
never been used?

When read_buf is called to move over to the next page in the pagelist
of an NFSv4 request, it sets argp->end to essentially a random
number, certainly not an address within the page which argp->p now
points to. So subsequent calls to READ_BUF will think there is much
more than a page of spare space (the cast to u32 ensures an unsigned
comparison) so we can expect to fall off the end of the second
page.

I guess we never ever receive requests with any operation starting
beyond the first page!

[[
I found this while looking at why fsstress over NFS over RDMA caused
a bad memory dereference in READ32, suggesting that 'p' had a bad
value. However it was ffff8801299188f0, which is not an "I've fallen
off the end of the page" sort of value. So I think it must be a
different bug :-( It is as if the page is being unmapped underneath
us...
]]
NeilBrown




diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index e170317..34ccf81 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -161,10 +161,10 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
argp->p = page_address(argp->pagelist[0]);
argp->pagelist++;
if (argp->pagelen < PAGE_SIZE) {
- argp->end = p + (argp->pagelen>>2);
+ argp->end = argp->p + (argp->pagelen>>2);
argp->pagelen = 0;
} else {
- argp->end = p + (PAGE_SIZE>>2);
+ argp->end = argp->p + (PAGE_SIZE>>2);
argp->pagelen -= PAGE_SIZE;
}
memcpy(((char*)p)+avail, argp->p, (nbytes - avail));
@@ -1426,10 +1426,10 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
argp->p = page_address(argp->pagelist[0]);
argp->pagelist++;
if (argp->pagelen < PAGE_SIZE) {
- argp->end = p + (argp->pagelen>>2);
+ argp->end = argp->p + (argp->pagelen>>2);
argp->pagelen = 0;
} else {
- argp->end = p + (PAGE_SIZE>>2);
+ argp->end = argp->p + (PAGE_SIZE>>2);
argp->pagelen -= PAGE_SIZE;
}
}




2010-04-20 16:51:53

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] bug in read_buf

On Tue, Apr 20, 2010 at 12:16:52PM +1000, Neil Brown wrote:
>
> Surely this can never have worked... which implies that the code has
> never been used?
>
> When read_buf is called to move over to the next page in the pagelist
> of an NFSv4 request, it sets argp->end to essentially a random
> number, certainly not an address within the page which argp->p now
> points to. So subsequent calls to READ_BUF will think there is much
> more than a page of spare space (the cast to u32 ensures an unsigned
> comparison) so we can expect to fall off the end of the second
> page.

Yipes, thanks.

> I guess we never ever receive requests with any operation starting
> beyond the first page!

putfh-write-getattr, for example, is common enough. The write decoding
should leave arg->end set correctly. But there are two read_buf()'s in
decode_getattr(), and I can't see why we don't hit this bug on a write
that leaves that final getattr exactly straddling a page boundary.

--b.

> [[
> I found this while looking at why fsstress over NFS over RDMA caused
> a bad memory dereference in READ32, suggesting that 'p' had a bad
> value. However it was ffff8801299188f0, which is not an "I've fallen
> off the end of the page" sort of value. So I think it must be a
> different bug :-( It is as if the page is being unmapped underneath
> us...
> ]]
> NeilBrown
>
>
>
>
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index e170317..34ccf81 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -161,10 +161,10 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
> argp->p = page_address(argp->pagelist[0]);
> argp->pagelist++;
> if (argp->pagelen < PAGE_SIZE) {
> - argp->end = p + (argp->pagelen>>2);
> + argp->end = argp->p + (argp->pagelen>>2);
> argp->pagelen = 0;
> } else {
> - argp->end = p + (PAGE_SIZE>>2);
> + argp->end = argp->p + (PAGE_SIZE>>2);
> argp->pagelen -= PAGE_SIZE;
> }
> memcpy(((char*)p)+avail, argp->p, (nbytes - avail));
> @@ -1426,10 +1426,10 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
> argp->p = page_address(argp->pagelist[0]);
> argp->pagelist++;
> if (argp->pagelen < PAGE_SIZE) {
> - argp->end = p + (argp->pagelen>>2);
> + argp->end = argp->p + (argp->pagelen>>2);
> argp->pagelen = 0;
> } else {
> - argp->end = p + (PAGE_SIZE>>2);
> + argp->end = argp->p + (PAGE_SIZE>>2);
> argp->pagelen -= PAGE_SIZE;
> }
> }
>
>

2010-04-22 15:41:50

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH] bug in read_buf

On Tue, Apr 20, 2010 at 3:39 PM, J. Bruce Fields <[email protected]>=
wrote:
> On Tue, Apr 20, 2010 at 03:24:59PM -0400, William A. (Andy) Adamson w=
rote:
>> On Tue, Apr 20, 2010 at 12:51 PM, J. Bruce Fields <bfields@fieldses.=
org> wrote:
>> > On Tue, Apr 20, 2010 at 12:16:52PM +1000, Neil Brown wrote:
>> >>
>> >> Surely this can never have worked... which implies that the code =
has
>> >> never been used?
>> >>
>> >> When read_buf is called to move over to the next page in the page=
list
>> >> of an NFSv4 request, it sets argp->end to essentially a random
>> >> number, certainly not an address within the page which argp->p no=
w
>> >> points to. =A0So subsequent calls to READ_BUF will think there is=
much
>> >> more than a page of spare space (the cast to u32 ensures an unsig=
ned
>> >> comparison) so we can expect to fall off the end of the second
>> >> page.
>> >
>> > Yipes, thanks.
>> >
>> >> I guess we never ever receive requests with any operation startin=
g
>> >> beyond the first page!
>> >
>> > putfh-write-getattr, for example, is common enough. =A0The write d=
ecoding
>> > should leave arg->end set correctly. =A0But there are two read_buf=
()'s in
>> > decode_getattr(), and I can't see why we don't hit this bug on a w=
rite
>> > that leaves that final getattr exactly straddling a page boundary.
>>
>> The write data is dumped into the rq_vec which has non-contiguous
>> pages. So the xdr_buf head only holds the putfh result, the short
>> write response header (v4 stateid, offset, how, length, etc), and th=
en
>> the getattr. so there is plenty of space.
>
> This is the server-side write-decoding, so you could see:
>
>
> =A0 =A0 =A0 =A0rpc header | putfh | write ... data ... | getattr
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ^
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 |
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0page boundary here


ulp - you're right.

-->Andy

>
> --b.
>
>>
>> -->Andy
>>
>> >
>> > --b.
>> >
>> >> [[
>> >> I found this while looking at why fsstress over NFS over RDMA cau=
sed
>> >> a bad memory dereference in READ32, suggesting that 'p' had a bad
>> >> value. =A0However it was ffff8801299188f0, which is not an "I've =
fallen
>> >> off the end of the page" sort of value. =A0So I think it must be =
a
>> >> different bug :-( =A0It is as if the page is being unmapped under=
neath
>> >> us...
>> >> ]]
>> >> NeilBrown
>> >>
>> >>
>> >>
>> >>
>> >> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>> >> index e170317..34ccf81 100644
>> >> --- a/fs/nfsd/nfs4xdr.c
>> >> +++ b/fs/nfsd/nfs4xdr.c
>> >> @@ -161,10 +161,10 @@ static __be32 *read_buf(struct nfsd4_compou=
ndargs *argp, u32 nbytes)
>> >> =A0 =A0 =A0 argp->p =3D page_address(argp->pagelist[0]);
>> >> =A0 =A0 =A0 argp->pagelist++;
>> >> =A0 =A0 =A0 if (argp->pagelen < PAGE_SIZE) {
>> >> - =A0 =A0 =A0 =A0 =A0 =A0 argp->end =3D p + (argp->pagelen>>2);
>> >> + =A0 =A0 =A0 =A0 =A0 =A0 argp->end =3D argp->p + (argp->pagelen>=
>2);
>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp->pagelen =3D 0;
>> >> =A0 =A0 =A0 } else {
>> >> - =A0 =A0 =A0 =A0 =A0 =A0 argp->end =3D p + (PAGE_SIZE>>2);
>> >> + =A0 =A0 =A0 =A0 =A0 =A0 argp->end =3D argp->p + (PAGE_SIZE>>2);
>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp->pagelen -=3D PAGE_SIZE;
>> >> =A0 =A0 =A0 }
>> >> =A0 =A0 =A0 memcpy(((char*)p)+avail, argp->p, (nbytes - avail));
>> >> @@ -1426,10 +1426,10 @@ nfsd4_decode_compound(struct nfsd4_compou=
ndargs *argp)
>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp->p =3D page_addr=
ess(argp->pagelist[0]);
>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp->pagelist++;
>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (argp->pagelen < P=
AGE_SIZE) {
>> >> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp->e=
nd =3D p + (argp->pagelen>>2);
>> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp->e=
nd =3D argp->p + (argp->pagelen>>2);
>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp-=
>pagelen =3D 0;
>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } else {
>> >> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp->e=
nd =3D p + (PAGE_SIZE>>2);
>> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp->e=
nd =3D argp->p + (PAGE_SIZE>>2);
>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp-=
>pagelen -=3D PAGE_SIZE;
>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
>> >>
>> >>
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-nf=
s" in
>> > the body of a message to [email protected]
>> > More majordomo info at =A0http://vger.kernel.org/majordomo-info.ht=
ml
>> >
>

2010-04-20 19:25:00

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH] bug in read_buf

On Tue, Apr 20, 2010 at 12:51 PM, J. Bruce Fields <[email protected]=
> wrote:
> On Tue, Apr 20, 2010 at 12:16:52PM +1000, Neil Brown wrote:
>>
>> Surely this can never have worked... which implies that the code has
>> never been used?
>>
>> When read_buf is called to move over to the next page in the pagelis=
t
>> of an NFSv4 request, it sets argp->end to essentially a random
>> number, certainly not an address within the page which argp->p now
>> points to. =A0So subsequent calls to READ_BUF will think there is mu=
ch
>> more than a page of spare space (the cast to u32 ensures an unsigned
>> comparison) so we can expect to fall off the end of the second
>> page.
>
> Yipes, thanks.
>
>> I guess we never ever receive requests with any operation starting
>> beyond the first page!
>
> putfh-write-getattr, for example, is common enough. =A0The write deco=
ding
> should leave arg->end set correctly. =A0But there are two read_buf()'=
s in
> decode_getattr(), and I can't see why we don't hit this bug on a writ=
e
> that leaves that final getattr exactly straddling a page boundary.

The write data is dumped into the rq_vec which has non-contiguous
pages. So the xdr_buf head only holds the putfh result, the short
write response header (v4 stateid, offset, how, length, etc), and then
the getattr. so there is plenty of space.

-->Andy

>
> --b.
>
>> [[
>> I found this while looking at why fsstress over NFS over RDMA caused
>> a bad memory dereference in READ32, suggesting that 'p' had a bad
>> value. =A0However it was ffff8801299188f0, which is not an "I've fal=
len
>> off the end of the page" sort of value. =A0So I think it must be a
>> different bug :-( =A0It is as if the page is being unmapped undernea=
th
>> us...
>> ]]
>> NeilBrown
>>
>>
>>
>>
>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>> index e170317..34ccf81 100644
>> --- a/fs/nfsd/nfs4xdr.c
>> +++ b/fs/nfsd/nfs4xdr.c
>> @@ -161,10 +161,10 @@ static __be32 *read_buf(struct nfsd4_compounda=
rgs *argp, u32 nbytes)
>> =A0 =A0 =A0 argp->p =3D page_address(argp->pagelist[0]);
>> =A0 =A0 =A0 argp->pagelist++;
>> =A0 =A0 =A0 if (argp->pagelen < PAGE_SIZE) {
>> - =A0 =A0 =A0 =A0 =A0 =A0 argp->end =3D p + (argp->pagelen>>2);
>> + =A0 =A0 =A0 =A0 =A0 =A0 argp->end =3D argp->p + (argp->pagelen>>2)=
;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp->pagelen =3D 0;
>> =A0 =A0 =A0 } else {
>> - =A0 =A0 =A0 =A0 =A0 =A0 argp->end =3D p + (PAGE_SIZE>>2);
>> + =A0 =A0 =A0 =A0 =A0 =A0 argp->end =3D argp->p + (PAGE_SIZE>>2);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp->pagelen -=3D PAGE_SIZE;
>> =A0 =A0 =A0 }
>> =A0 =A0 =A0 memcpy(((char*)p)+avail, argp->p, (nbytes - avail));
>> @@ -1426,10 +1426,10 @@ nfsd4_decode_compound(struct nfsd4_compounda=
rgs *argp)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp->p =3D page_address=
(argp->pagelist[0]);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp->pagelist++;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (argp->pagelen < PAGE=
_SIZE) {
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp->end =
=3D p + (argp->pagelen>>2);
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp->end =
=3D argp->p + (argp->pagelen>>2);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp->pa=
gelen =3D 0;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } else {
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp->end =
=3D p + (PAGE_SIZE>>2);
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp->end =
=3D argp->p + (PAGE_SIZE>>2);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 argp->pa=
gelen -=3D PAGE_SIZE;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to [email protected]
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>

2010-04-20 19:39:46

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] bug in read_buf

On Tue, Apr 20, 2010 at 03:24:59PM -0400, William A. (Andy) Adamson wro=
te:
> On Tue, Apr 20, 2010 at 12:51 PM, J. Bruce Fields <[email protected]=
rg> wrote:
> > On Tue, Apr 20, 2010 at 12:16:52PM +1000, Neil Brown wrote:
> >>
> >> Surely this can never have worked... which implies that the code h=
as
> >> never been used?
> >>
> >> When read_buf is called to move over to the next page in the pagel=
ist
> >> of an NFSv4 request, it sets argp->end to essentially a random
> >> number, certainly not an address within the page which argp->p now
> >> points to. =C2=A0So subsequent calls to READ_BUF will think there =
is much
> >> more than a page of spare space (the cast to u32 ensures an unsign=
ed
> >> comparison) so we can expect to fall off the end of the second
> >> page.
> >
> > Yipes, thanks.
> >
> >> I guess we never ever receive requests with any operation starting
> >> beyond the first page!
> >
> > putfh-write-getattr, for example, is common enough. =C2=A0The write=
decoding
> > should leave arg->end set correctly. =C2=A0But there are two read_b=
uf()'s in
> > decode_getattr(), and I can't see why we don't hit this bug on a wr=
ite
> > that leaves that final getattr exactly straddling a page boundary.
>=20
> The write data is dumped into the rq_vec which has non-contiguous
> pages. So the xdr_buf head only holds the putfh result, the short
> write response header (v4 stateid, offset, how, length, etc), and the=
n
> the getattr. so there is plenty of space.

This is the server-side write-decoding, so you could see:


rpc header | putfh | write ... data ... | getattr
^
|
page boundary here

--b.

>=20
> -->Andy
>=20
> >
> > --b.
> >
> >> [[
> >> I found this while looking at why fsstress over NFS over RDMA caus=
ed
> >> a bad memory dereference in READ32, suggesting that 'p' had a bad
> >> value. =C2=A0However it was ffff8801299188f0, which is not an "I'v=
e fallen
> >> off the end of the page" sort of value. =C2=A0So I think it must b=
e a
> >> different bug :-( =C2=A0It is as if the page is being unmapped und=
erneath
> >> us...
> >> ]]
> >> NeilBrown
> >>
> >>
> >>
> >>
> >> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> >> index e170317..34ccf81 100644
> >> --- a/fs/nfsd/nfs4xdr.c
> >> +++ b/fs/nfsd/nfs4xdr.c
> >> @@ -161,10 +161,10 @@ static __be32 *read_buf(struct nfsd4_compoun=
dargs *argp, u32 nbytes)
> >> =C2=A0 =C2=A0 =C2=A0 argp->p =3D page_address(argp->pagelist[0]);
> >> =C2=A0 =C2=A0 =C2=A0 argp->pagelist++;
> >> =C2=A0 =C2=A0 =C2=A0 if (argp->pagelen < PAGE_SIZE) {
> >> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->end =3D p + (arg=
p->pagelen>>2);
> >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->end =3D argp->p =
+ (argp->pagelen>>2);
> >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->pagelen =3D=
0;
> >> =C2=A0 =C2=A0 =C2=A0 } else {
> >> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->end =3D p + (PAG=
E_SIZE>>2);
> >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->end =3D argp->p =
+ (PAGE_SIZE>>2);
> >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->pagelen -=3D=
PAGE_SIZE;
> >> =C2=A0 =C2=A0 =C2=A0 }
> >> =C2=A0 =C2=A0 =C2=A0 memcpy(((char*)p)+avail, argp->p, (nbytes - a=
vail));
> >> @@ -1426,10 +1426,10 @@ nfsd4_decode_compound(struct nfsd4_compoun=
dargs *argp)
> >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 argp->p =3D page_address(argp->pagelist[0]);
> >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 argp->pagelist++;
> >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 if (argp->pagelen < PAGE_SIZE) {
> >> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->end =3D p + (argp->pagelen>>2);
> >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->end =3D argp->p + (argp->pagelen>=
>2);
> >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->pagelen =3D 0;
> >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 } else {
> >> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->end =3D p + (PAGE_SIZE>>2);
> >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->end =3D argp->p + (PAGE_SIZE>>2);
> >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->pagelen -=3D PAGE_SIZE;
> >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 }
> >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }
> >>
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs=
" in
> > the body of a message to [email protected]
> > More majordomo info at =C2=A0http://vger.kernel.org/majordomo-info.=
html
> >

2010-04-21 22:35:29

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] bug in read_buf

On Tue, Apr 20, 2010 at 03:39:44PM -0400, J. Bruce Fields wrote:
> On Tue, Apr 20, 2010 at 03:24:59PM -0400, William A. (Andy) Adamson w=
rote:
> > On Tue, Apr 20, 2010 at 12:51 PM, J. Bruce Fields <bfields@fieldses=
=2Eorg> wrote:
> > > On Tue, Apr 20, 2010 at 12:16:52PM +1000, Neil Brown wrote:
> > >>
> > >> Surely this can never have worked... which implies that the code=
has
> > >> never been used?
> > >>
> > >> When read_buf is called to move over to the next page in the pag=
elist
> > >> of an NFSv4 request, it sets argp->end to essentially a random
> > >> number, certainly not an address within the page which argp->p n=
ow
> > >> points to. =C2=A0So subsequent calls to READ_BUF will think ther=
e is much
> > >> more than a page of spare space (the cast to u32 ensures an unsi=
gned
> > >> comparison) so we can expect to fall off the end of the second
> > >> page.
> > >
> > > Yipes, thanks.
> > >
> > >> I guess we never ever receive requests with any operation starti=
ng
> > >> beyond the first page!
> > >
> > > putfh-write-getattr, for example, is common enough. =C2=A0The wri=
te decoding
> > > should leave arg->end set correctly. =C2=A0But there are two read=
_buf()'s in
> > > decode_getattr(), and I can't see why we don't hit this bug on a =
write
> > > that leaves that final getattr exactly straddling a page boundary=
=2E
> >=20
> > The write data is dumped into the rq_vec which has non-contiguous
> > pages. So the xdr_buf head only holds the putfh result, the short
> > write response header (v4 stateid, offset, how, length, etc), and t=
hen
> > the getattr. so there is plenty of space.
>=20
> This is the server-side write-decoding, so you could see:
>=20
>=20
> rpc header | putfh | write ... data ... | getattr
> ^
> |
> page boundary here

Hm, I guess even when argp->end is wrong, argp->p is always set to
something sane; so on the next READ_BUF(), when you hit the

nbytes <=3D (u32)((char *)argp->end - (char *)argp->p

case, you do

p =3D argp->p;
argp->p +=3D XDR_QUADLEN(nbytes);

and p is something reasonable. "end" stays wrong, but that won't be a
problem until you run past the end of the *next* page, which it would
take a very unusual compound to do.

--b.

>=20
> --b.
>=20
> >=20
> > -->Andy
> >=20
> > >
> > > --b.
> > >
> > >> [[
> > >> I found this while looking at why fsstress over NFS over RDMA ca=
used
> > >> a bad memory dereference in READ32, suggesting that 'p' had a ba=
d
> > >> value. =C2=A0However it was ffff8801299188f0, which is not an "I=
've fallen
> > >> off the end of the page" sort of value. =C2=A0So I think it must=
be a
> > >> different bug :-( =C2=A0It is as if the page is being unmapped u=
nderneath
> > >> us...
> > >> ]]
> > >> NeilBrown
> > >>
> > >>
> > >>
> > >>
> > >> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > >> index e170317..34ccf81 100644
> > >> --- a/fs/nfsd/nfs4xdr.c
> > >> +++ b/fs/nfsd/nfs4xdr.c
> > >> @@ -161,10 +161,10 @@ static __be32 *read_buf(struct nfsd4_compo=
undargs *argp, u32 nbytes)
> > >> =C2=A0 =C2=A0 =C2=A0 argp->p =3D page_address(argp->pagelist[0])=
;
> > >> =C2=A0 =C2=A0 =C2=A0 argp->pagelist++;
> > >> =C2=A0 =C2=A0 =C2=A0 if (argp->pagelen < PAGE_SIZE) {
> > >> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->end =3D p + (a=
rgp->pagelen>>2);
> > >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->end =3D argp->=
p + (argp->pagelen>>2);
> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->pagelen =3D=
0;
> > >> =C2=A0 =C2=A0 =C2=A0 } else {
> > >> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->end =3D p + (P=
AGE_SIZE>>2);
> > >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->end =3D argp->=
p + (PAGE_SIZE>>2);
> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->pagelen -=
=3D PAGE_SIZE;
> > >> =C2=A0 =C2=A0 =C2=A0 }
> > >> =C2=A0 =C2=A0 =C2=A0 memcpy(((char*)p)+avail, argp->p, (nbytes -=
avail));
> > >> @@ -1426,10 +1426,10 @@ nfsd4_decode_compound(struct nfsd4_compo=
undargs *argp)
> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 argp->p =3D page_address(argp->pagelist[0]);
> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 argp->pagelist++;
> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 if (argp->pagelen < PAGE_SIZE) {
> > >> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->end =3D p + (argp->pagelen>>2=
);
> > >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->end =3D argp->p + (argp->page=
len>>2);
> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->pagelen =3D 0;
> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 } else {
> > >> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->end =3D p + (PAGE_SIZE>>2);
> > >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->end =3D argp->p + (PAGE_SIZE>=
>2);
> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 argp->pagelen -=3D PAGE_SIZE;
> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 }
> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }
> > >>
> > >>
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-n=
fs" in
> > > the body of a message to [email protected]
> > > More majordomo info at =C2=A0http://vger.kernel.org/majordomo-inf=
o.html
> > >

2010-04-21 22:36:06

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] bug in read_buf

On Wed, Apr 21, 2010 at 06:35:27PM -0400, J. Bruce Fields wrote:
> On Tue, Apr 20, 2010 at 03:39:44PM -0400, J. Bruce Fields wrote:
> > On Tue, Apr 20, 2010 at 03:24:59PM -0400, William A. (Andy) Adamson=
wrote:
> > > On Tue, Apr 20, 2010 at 12:51 PM, J. Bruce Fields <bfields@fields=
es.org> wrote:
> > > > On Tue, Apr 20, 2010 at 12:16:52PM +1000, Neil Brown wrote:
> > > >>
> > > >> Surely this can never have worked... which implies that the co=
de has
> > > >> never been used?
> > > >>
> > > >> When read_buf is called to move over to the next page in the p=
agelist
> > > >> of an NFSv4 request, it sets argp->end to essentially a random
> > > >> number, certainly not an address within the page which argp->p=
now
> > > >> points to. =C2=A0So subsequent calls to READ_BUF will think th=
ere is much
> > > >> more than a page of spare space (the cast to u32 ensures an un=
signed
> > > >> comparison) so we can expect to fall off the end of the second
> > > >> page.
> > > >
> > > > Yipes, thanks.
> > > >
> > > >> I guess we never ever receive requests with any operation star=
ting
> > > >> beyond the first page!
> > > >
> > > > putfh-write-getattr, for example, is common enough. =C2=A0The w=
rite decoding
> > > > should leave arg->end set correctly. =C2=A0But there are two re=
ad_buf()'s in
> > > > decode_getattr(), and I can't see why we don't hit this bug on =
a write
> > > > that leaves that final getattr exactly straddling a page bounda=
ry.
> > >=20
> > > The write data is dumped into the rq_vec which has non-contiguous
> > > pages. So the xdr_buf head only holds the putfh result, the short
> > > write response header (v4 stateid, offset, how, length, etc), and=
then
> > > the getattr. so there is plenty of space.
> >=20
> > This is the server-side write-decoding, so you could see:
> >=20
> >=20
> > rpc header | putfh | write ... data ... | getattr
> > ^
> > |
> > page boundary here
>=20
> Hm, I guess even when argp->end is wrong, argp->p is always set to
> something sane; so on the next READ_BUF(), when you hit the
>=20
> nbytes <=3D (u32)((char *)argp->end - (char *)argp->p
>=20
> case, you do
>=20
> p =3D argp->p;
> argp->p +=3D XDR_QUADLEN(nbytes);
>=20
> and p is something reasonable. "end" stays wrong, but that won't be =
a
> problem until you run past the end of the *next* page, which it would
> take a very unusual compound to do.

(Nevertheless: applied, for 2.6.34 and stable.)

--b.

2010-04-21 23:08:12

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] bug in read_buf

On Wed, 21 Apr 2010 18:36:05 -0400
"J. Bruce Fields" <[email protected]> wrote:


> > Hm, I guess even when argp->end is wrong, argp->p is always set to
> > something sane; so on the next READ_BUF(), when you hit the
> >
> > nbytes <= (u32)((char *)argp->end - (char *)argp->p
> >
> > case, you do
> >
> > p = argp->p;
> > argp->p += XDR_QUADLEN(nbytes);
> >
> > and p is something reasonable. "end" stays wrong, but that won't be a
> > problem until you run past the end of the *next* page, which it would
> > take a very unusual compound to do.

Yes, it would not be an easy bug to trigger ... it takes away some of the
thrill of finding a bug when you discover that it only affects a corner case
that never ever happens :-(

>
> (Nevertheless: applied, for 2.6.34 and stable.)

Thanks.

NeilBrown