2012-12-12 14:46:57

by Jeff Layton

[permalink] [raw]
Subject: non-aligned DIO reads on NFS are corrupting memory in 3.7.0

One of our QA folks found that the attached testcase would segfault
when run on a recent rhel6 kernel that has a backport of the pnfs dio
code. I get the same segfault when I run it on a 3.7.0 kernel as well.

I think the problem is that because the buffer we're reading into is on
the stack, the kernel is scribbling over the rest of the page after the
read and corrupting it.

The problem, I think is this block in nfs_direct_read_completion():

-----------------------[snip]-----------------------
if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) {
if (bytes > hdr->good_bytes)
zero_user(page, 0, PAGE_SIZE);
else if (hdr->good_bytes - bytes < PAGE_SIZE)
zero_user_segment(page,
hdr->good_bytes & ~PAGE_MASK,
PAGE_SIZE);
}
-----------------------[snip]-----------------------

If I comment that out, then the test passes and it doesn't scribble
over memory. I'm not clear on what that block is trying to accomplish.
If we get a short read in the DIO codepath, I don't think we ought to
be zeroing out the rest of the page. We should just return the number
of bytes read and be done with it.

I'm also suspicious of the "if (!PageCompound(page))" check in that
function as well. It doesn't seem like we ought to be marking pages
dirty in the DIO codepaths, should we?

--
Jeff Layton <[email protected]>


Attachments:
(No filename) (1.49 kB)
dio.c (1.44 kB)
Download all attachments

2012-12-12 16:20:12

by Fred Isaman

[permalink] [raw]
Subject: Re: non-aligned DIO reads on NFS are corrupting memory in 3.7.0

On Wed, Dec 12, 2012 at 9:46 AM, Jeff Layton <[email protected]> wrote:
> One of our QA folks found that the attached testcase would segfault
> when run on a recent rhel6 kernel that has a backport of the pnfs dio
> code. I get the same segfault when I run it on a 3.7.0 kernel as well.
>
> I think the problem is that because the buffer we're reading into is on
> the stack, the kernel is scribbling over the rest of the page after the
> read and corrupting it.
>
> The problem, I think is this block in nfs_direct_read_completion():
>
> -----------------------[snip]-----------------------
> if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) {
> if (bytes > hdr->good_bytes)
> zero_user(page, 0, PAGE_SIZE);
> else if (hdr->good_bytes - bytes < PAGE_SIZE)
> zero_user_segment(page,
> hdr->good_bytes & ~PAGE_MASK,
> PAGE_SIZE);
> }
> -----------------------[snip]-----------------------
>
> If I comment that out, then the test passes and it doesn't scribble
> over memory. I'm not clear on what that block is trying to accomplish.
> If we get a short read in the DIO codepath, I don't think we ought to
> be zeroing out the rest of the page. We should just return the number
> of bytes read and be done with it.
>

I would say the problem is not zeroing memory, but that the code isn't
taking into account the offsets into the page.


> I'm also suspicious of the "if (!PageCompound(page))" check in that
> function as well. It doesn't seem like we ought to be marking pages
> dirty in the DIO codepaths, should we?
>
> --
> Jeff Layton <[email protected]>

I'm not sure if we should, but code to do so has been around forever.
The exception for PageCompound is from commit 566dd6064e89b "NFS: Make
directIO aware of compound pages", almost 7 years ago.

Fred

2012-12-12 16:31:45

by Jeff Layton

[permalink] [raw]
Subject: Re: non-aligned DIO reads on NFS are corrupting memory in 3.7.0

On Wed, 12 Dec 2012 11:20:11 -0500
Fred Isaman <[email protected]> wrote:

> On Wed, Dec 12, 2012 at 9:46 AM, Jeff Layton <[email protected]> wrote:
> > One of our QA folks found that the attached testcase would segfault
> > when run on a recent rhel6 kernel that has a backport of the pnfs dio
> > code. I get the same segfault when I run it on a 3.7.0 kernel as well.
> >
> > I think the problem is that because the buffer we're reading into is on
> > the stack, the kernel is scribbling over the rest of the page after the
> > read and corrupting it.
> >
> > The problem, I think is this block in nfs_direct_read_completion():
> >
> > -----------------------[snip]-----------------------
> > if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) {
> > if (bytes > hdr->good_bytes)
> > zero_user(page, 0, PAGE_SIZE);
> > else if (hdr->good_bytes - bytes < PAGE_SIZE)
> > zero_user_segment(page,
> > hdr->good_bytes & ~PAGE_MASK,
> > PAGE_SIZE);
> > }
> > -----------------------[snip]-----------------------
> >
> > If I comment that out, then the test passes and it doesn't scribble
> > over memory. I'm not clear on what that block is trying to accomplish.
> > If we get a short read in the DIO codepath, I don't think we ought to
> > be zeroing out the rest of the page. We should just return the number
> > of bytes read and be done with it.
> >
>
> I would say the problem is not zeroing memory, but that the code isn't
> taking into account the offsets into the page.
>

Erm maybe...

I don't get it though. Why would you ever want to zero out the rest of
the buffer on a DIO read() request? You're certainly under no obligation
to do so. If you didn't get all of the data requested, you're going to
return a number that's less than "count", and you should be fine to
just ignore the rest of the buffer.

>
> > I'm also suspicious of the "if (!PageCompound(page))" check in that
> > function as well. It doesn't seem like we ought to be marking pages
> > dirty in the DIO codepaths, should we?
> >
>
> I'm not sure if we should, but code to do so has been around forever.
> The exception for PageCompound is from commit 566dd6064e89b "NFS: Make
> directIO aware of compound pages", almost 7 years ago.
>
> Fred
>

Yeah. Maybe it's concern with someone doing DIO reads into a mmapped
buffer? Dunno...

--
Jeff Layton <[email protected]>

2012-12-12 16:30:35

by Myklebust, Trond

[permalink] [raw]
Subject: Re: non-aligned DIO reads on NFS are corrupting memory in 3.7.0

T24gV2VkLCAyMDEyLTEyLTEyIGF0IDExOjIwIC0wNTAwLCBGcmVkIElzYW1hbiB3cm90ZToNCj4g
T24gV2VkLCBEZWMgMTIsIDIwMTIgYXQgOTo0NiBBTSwgSmVmZiBMYXl0b24gPGpsYXl0b25AcmVk
aGF0LmNvbT4gd3JvdGU6DQo+ID4gT25lIG9mIG91ciBRQSBmb2xrcyBmb3VuZCB0aGF0IHRoZSBh
dHRhY2hlZCB0ZXN0Y2FzZSB3b3VsZCBzZWdmYXVsdA0KPiA+IHdoZW4gcnVuIG9uIGEgcmVjZW50
IHJoZWw2IGtlcm5lbCB0aGF0IGhhcyBhIGJhY2twb3J0IG9mIHRoZSBwbmZzIGRpbw0KPiA+IGNv
ZGUuIEkgZ2V0IHRoZSBzYW1lIHNlZ2ZhdWx0IHdoZW4gSSBydW4gaXQgb24gYSAzLjcuMCBrZXJu
ZWwgYXMgd2VsbC4NCj4gPg0KPiA+IEkgdGhpbmsgdGhlIHByb2JsZW0gaXMgdGhhdCBiZWNhdXNl
IHRoZSBidWZmZXIgd2UncmUgcmVhZGluZyBpbnRvIGlzIG9uDQo+ID4gdGhlIHN0YWNrLCB0aGUg
a2VybmVsIGlzIHNjcmliYmxpbmcgb3ZlciB0aGUgcmVzdCBvZiB0aGUgcGFnZSBhZnRlciB0aGUN
Cj4gPiByZWFkIGFuZCBjb3JydXB0aW5nIGl0Lg0KPiA+DQo+ID4gVGhlIHByb2JsZW0sIEkgdGhp
bmsgaXMgdGhpcyBibG9jayBpbiBuZnNfZGlyZWN0X3JlYWRfY29tcGxldGlvbigpOg0KPiA+DQo+
ID4gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS1bc25pcF0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0K
PiA+ICAgICAgICAgICAgICAgICBpZiAodGVzdF9iaXQoTkZTX0lPSERSX0VPRiwgJmhkci0+Zmxh
Z3MpKSB7DQo+ID4gICAgICAgICAgICAgICAgICAgICAgICAgaWYgKGJ5dGVzID4gaGRyLT5nb29k
X2J5dGVzKQ0KPiA+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgemVyb191c2VyKHBh
Z2UsIDAsIFBBR0VfU0laRSk7DQo+ID4gICAgICAgICAgICAgICAgICAgICAgICAgZWxzZSBpZiAo
aGRyLT5nb29kX2J5dGVzIC0gYnl0ZXMgPCBQQUdFX1NJWkUpDQo+ID4gICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICB6ZXJvX3VzZXJfc2VnbWVudChwYWdlLA0KPiA+ICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICBoZHItPmdvb2RfYnl0ZXMgJiB+UEFHRV9NQVNL
LA0KPiA+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBQQUdFX1NJWkUp
Ow0KPiA+ICAgICAgICAgICAgICAgICB9DQo+ID4gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS1bc25p
cF0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0KPiA+DQo+ID4gSWYgSSBjb21tZW50IHRoYXQgb3V0
LCB0aGVuIHRoZSB0ZXN0IHBhc3NlcyBhbmQgaXQgZG9lc24ndCBzY3JpYmJsZQ0KPiA+IG92ZXIg
bWVtb3J5LiBJJ20gbm90IGNsZWFyIG9uIHdoYXQgdGhhdCBibG9jayBpcyB0cnlpbmcgdG8gYWNj
b21wbGlzaC4NCj4gPiBJZiB3ZSBnZXQgYSBzaG9ydCByZWFkIGluIHRoZSBESU8gY29kZXBhdGgs
IEkgZG9uJ3QgdGhpbmsgd2Ugb3VnaHQgdG8NCj4gPiBiZSB6ZXJvaW5nIG91dCB0aGUgcmVzdCBv
ZiB0aGUgcGFnZS4gV2Ugc2hvdWxkIGp1c3QgcmV0dXJuIHRoZSBudW1iZXINCj4gPiBvZiBieXRl
cyByZWFkIGFuZCBiZSBkb25lIHdpdGggaXQuDQo+ID4NCj4gDQo+IEkgd291bGQgc2F5IHRoZSBw
cm9ibGVtIGlzIG5vdCB6ZXJvaW5nIG1lbW9yeSwgYnV0IHRoYXQgdGhlIGNvZGUgaXNuJ3QNCj4g
dGFraW5nIGludG8gYWNjb3VudCB0aGUgb2Zmc2V0cyBpbnRvIHRoZSBwYWdlLg0KDQpUaGUgbWVt
b3J5IHJlZ2lvbiBsaWVzIGFmdGVyIHRoZSBFT0YsIHNvIHdlJ3JlIG5vdCBjb3VudGluZyBpdCBp
bg0KZHJlcS0+Y291bnQuIE1vcmUgaW1wb3J0YW50bHksIGl0IHdvbid0IGJlIGFjY291bnRlZCBm
b3IgaW4gdGhlIHJlYWQoKQ0Kc3lzdGVtIGNhbGwgcmV0dXJuIHZhbHVlLCBzbyBJIHRoaW5rIHdl
IHNob3VsZCBhdm9pZCB0b3VjaGluZyBpdC4NCg0KPiA+IEknbSBhbHNvIHN1c3BpY2lvdXMgb2Yg
dGhlICJpZiAoIVBhZ2VDb21wb3VuZChwYWdlKSkiIGNoZWNrIGluIHRoYXQNCj4gPiBmdW5jdGlv
biBhcyB3ZWxsLiBJdCBkb2Vzbid0IHNlZW0gbGlrZSB3ZSBvdWdodCB0byBiZSBtYXJraW5nIHBh
Z2VzDQo+ID4gZGlydHkgaW4gdGhlIERJTyBjb2RlcGF0aHMsIHNob3VsZCB3ZT8NCj4gPg0KPiA+
IC0tDQo+ID4gSmVmZiBMYXl0b24gPGpsYXl0b25AcmVkaGF0LmNvbT4NCj4gDQo+IEknbSBub3Qg
c3VyZSBpZiB3ZSBzaG91bGQsIGJ1dCBjb2RlIHRvIGRvIHNvIGhhcyBiZWVuIGFyb3VuZCBmb3Jl
dmVyLg0KPiBUaGUgZXhjZXB0aW9uIGZvciBQYWdlQ29tcG91bmQgaXMgZnJvbSBjb21taXQgNTY2
ZGQ2MDY0ZTg5YiAiTkZTOiBNYWtlDQo+IGRpcmVjdElPIGF3YXJlIG9mIGNvbXBvdW5kIHBhZ2Vz
IiwgYWxtb3N0IDcgeWVhcnMgYWdvLg0KDQpSaWdodC4gVGhlIGNvZGUgZm9yIGRvaW5nIHRoaXMg
Z29lcyBiYWNrIHRvIHRoZSBvcmlnaW5hbCBPX0RJUkVDVA0KaW1wbGVtZW50YXRpb24gaW4gMjAw
My4gSUlSQywgQW5kcmV3IE1vcnRvbiBhc2tlZCB1cyB0byBkbyBpdCBpbiBvcmRlcg0KdG8gbWFr
ZSB0aGluZ3MgbGlrZSBPX0RJUkVDVCByZWFkIGludG8gYW4gbW1hcHBlZCBtZW1vcnkgcmVnaW9u
IHdvcmsNCmNvcnJlY3RseS4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGll
bnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cu
bmV0YXBwLmNvbQ0K