2016-07-26 02:41:42

by Seiichi Ikarashi

[permalink] [raw]
Subject: [PATCH v2] Prevent rqstp->rq_pages[RPCSVC_MAXPAGES] overrun

If over-"RPCSVC_MAXPAGES" pages are sent from file system through pipe_buffer,
nfsd_splice_actor() corrupts struct svc_rqst and results in kernel panic. It
actually occurred with a parallel distributed file system. It needs boundary
checking.

v2: Fix semicolon-missing bug.

Signed-off-by: Seiichi Ikarashi <[email protected]>

---
fs/nfsd/vfs.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 6fbd81e..43393f3 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -811,12 +811,20 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
size = sd->len;

if (rqstp->rq_res.page_len == 0) {
+ if (rqstp->rq_next_page > &rqstp->rq_pages[RPCSVC_MAXPAGES-1]) {
+ WARN_ON(1);
+ return -ENOMEM;
+ }
get_page(page);
put_page(*rqstp->rq_next_page);
*(rqstp->rq_next_page++) = page;
rqstp->rq_res.page_base = buf->offset;
rqstp->rq_res.page_len = size;
} else if (page != pp[-1]) {
+ if (rqstp->rq_next_page > &rqstp->rq_pages[RPCSVC_MAXPAGES-1]) {
+ WARN_ON(1);
+ return -ENOMEM;
+ }
get_page(page);
if (*rqstp->rq_next_page)
put_page(*rqstp->rq_next_page);


2016-07-26 20:19:48

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2] Prevent rqstp->rq_pages[RPCSVC_MAXPAGES] overrun

Thanks for the report.

On Tue, Jul 26, 2016 at 11:38:11AM +0900, Seiichi Ikarashi wrote:
> If over-"RPCSVC_MAXPAGES" pages are sent from file system through pipe_buffer,
> nfsd_splice_actor() corrupts struct svc_rqst and results in kernel panic. It
> actually occurred with a parallel distributed file system. It needs boundary
> checking.

This check might be useful as defensive programming, but the bug was
elsewhere.

In theory this should be prevented by the "maxcount" calculations in
nfsd4_encode_read().

What version of the kernel did you see this happen on? What was the
client, and what was it doing? Any other hints on reproducing?

--b.

>
> v2: Fix semicolon-missing bug.
>
> Signed-off-by: Seiichi Ikarashi <[email protected]>
>
> ---
> fs/nfsd/vfs.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 6fbd81e..43393f3 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -811,12 +811,20 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> size = sd->len;
>
> if (rqstp->rq_res.page_len == 0) {
> + if (rqstp->rq_next_page > &rqstp->rq_pages[RPCSVC_MAXPAGES-1]) {
> + WARN_ON(1);
> + return -ENOMEM;
> + }
> get_page(page);
> put_page(*rqstp->rq_next_page);
> *(rqstp->rq_next_page++) = page;
> rqstp->rq_res.page_base = buf->offset;
> rqstp->rq_res.page_len = size;
> } else if (page != pp[-1]) {
> + if (rqstp->rq_next_page > &rqstp->rq_pages[RPCSVC_MAXPAGES-1]) {
> + WARN_ON(1);
> + return -ENOMEM;
> + }
> get_page(page);
> if (*rqstp->rq_next_page)
> put_page(*rqstp->rq_next_page);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2016-07-26 23:59:25

by Seiichi Ikarashi

[permalink] [raw]
Subject: Re: [PATCH v2] Prevent rqstp->rq_pages[RPCSVC_MAXPAGES] overrun

Hi Bruce,

On 2016-07-27 05:19, J. Bruce Fields wrote:
> Thanks for the report.
>
> On Tue, Jul 26, 2016 at 11:38:11AM +0900, Seiichi Ikarashi wrote:
>> If over-"RPCSVC_MAXPAGES" pages are sent from file system through pipe_buffer,
>> nfsd_splice_actor() corrupts struct svc_rqst and results in kernel panic. It
>> actually occurred with a parallel distributed file system. It needs boundary
>> checking.
>
> This check might be useful as defensive programming, but the bug was
> elsewhere.

Yah, I think the main factor exists in file system and/or VFS splice sides.
But I also think that NFS should defend itself against overlimit pages
because the limit is decided by NFS/SUNRPC, not by file system and others.

>
> In theory this should be prevented by the "maxcount" calculations in
> nfsd4_encode_read().

The "maxcount" looks just limiting the read length from the file system.
Is my understanding correct?

And the number of pages provided from the file system is up to the file system.
The file system can split the read data into an arbitrary number of pages.

>
> What version of the kernel did you see this happen on? What was the
> client, and what was it doing? Any other hints on reproducing?

It was a NFSv3 read access from a client of maybe-Linux-based system
against the server of RHEL6 system exporting a file system that I cannot
access its source code. Sorry for lack of info.


Seiichi.


>
> --b.
>
>>
>> v2: Fix semicolon-missing bug.
>>
>> Signed-off-by: Seiichi Ikarashi <[email protected]>
>>
>> ---
>> fs/nfsd/vfs.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>> index 6fbd81e..43393f3 100644
>> --- a/fs/nfsd/vfs.c
>> +++ b/fs/nfsd/vfs.c
>> @@ -811,12 +811,20 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
>> size = sd->len;
>>
>> if (rqstp->rq_res.page_len == 0) {
>> + if (rqstp->rq_next_page > &rqstp->rq_pages[RPCSVC_MAXPAGES-1]) {
>> + WARN_ON(1);
>> + return -ENOMEM;
>> + }
>> get_page(page);
>> put_page(*rqstp->rq_next_page);
>> *(rqstp->rq_next_page++) = page;
>> rqstp->rq_res.page_base = buf->offset;
>> rqstp->rq_res.page_len = size;
>> } else if (page != pp[-1]) {
>> + if (rqstp->rq_next_page > &rqstp->rq_pages[RPCSVC_MAXPAGES-1]) {
>> + WARN_ON(1);
>> + return -ENOMEM;
>> + }
>> get_page(page);
>> if (*rqstp->rq_next_page)
>> put_page(*rqstp->rq_next_page);
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> .
>


2016-07-27 00:26:38

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2] Prevent rqstp->rq_pages[RPCSVC_MAXPAGES] overrun

On Wed, Jul 27, 2016 at 08:57:26AM +0900, Seiichi Ikarashi wrote:
> Hi Bruce,
>
> On 2016-07-27 05:19, J. Bruce Fields wrote:
> > Thanks for the report.
> >
> > On Tue, Jul 26, 2016 at 11:38:11AM +0900, Seiichi Ikarashi wrote:
> >> If over-"RPCSVC_MAXPAGES" pages are sent from file system through pipe_buffer,
> >> nfsd_splice_actor() corrupts struct svc_rqst and results in kernel panic. It
> >> actually occurred with a parallel distributed file system. It needs boundary
> >> checking.
> >
> > This check might be useful as defensive programming, but the bug was
> > elsewhere.
>
> Yah, I think the main factor exists in file system and/or VFS splice sides.
> But I also think that NFS should defend itself against overlimit pages
> because the limit is decided by NFS/SUNRPC, not by file system and others.
>
> >
> > In theory this should be prevented by the "maxcount" calculations in
> > nfsd4_encode_read().
>
> The "maxcount" looks just limiting the read length from the file system.
> Is my understanding correct?

Right.

>
> And the number of pages provided from the file system is up to the file system.
> The file system can split the read data into an arbitrary number of pages.

Oh, so if we ask the filesystem for 3 bytes it might potentially return
those in 3 separate pages? Is that at all legal?

> > What version of the kernel did you see this happen on? What was the
> > client, and what was it doing? Any other hints on reproducing?
>
> It was a NFSv3 read access from a client of maybe-Linux-based system
> against the server of RHEL6 system exporting a file system that I cannot
> access its source code. Sorry for lack of info.

OK, understood.

--b.

>
>
> Seiichi.
>
>
> >
> > --b.
> >
> >>
> >> v2: Fix semicolon-missing bug.
> >>
> >> Signed-off-by: Seiichi Ikarashi <[email protected]>
> >>
> >> ---
> >> fs/nfsd/vfs.c | 8 ++++++++
> >> 1 file changed, 8 insertions(+)
> >>
> >> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> >> index 6fbd81e..43393f3 100644
> >> --- a/fs/nfsd/vfs.c
> >> +++ b/fs/nfsd/vfs.c
> >> @@ -811,12 +811,20 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> >> size = sd->len;
> >>
> >> if (rqstp->rq_res.page_len == 0) {
> >> + if (rqstp->rq_next_page > &rqstp->rq_pages[RPCSVC_MAXPAGES-1]) {
> >> + WARN_ON(1);
> >> + return -ENOMEM;
> >> + }
> >> get_page(page);
> >> put_page(*rqstp->rq_next_page);
> >> *(rqstp->rq_next_page++) = page;
> >> rqstp->rq_res.page_base = buf->offset;
> >> rqstp->rq_res.page_len = size;
> >> } else if (page != pp[-1]) {
> >> + if (rqstp->rq_next_page > &rqstp->rq_pages[RPCSVC_MAXPAGES-1]) {
> >> + WARN_ON(1);
> >> + return -ENOMEM;
> >> + }
> >> get_page(page);
> >> if (*rqstp->rq_next_page)
> >> put_page(*rqstp->rq_next_page);
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> > .
> >

2016-07-27 01:18:06

by Seiichi Ikarashi

[permalink] [raw]
Subject: Re: [PATCH v2] Prevent rqstp->rq_pages[RPCSVC_MAXPAGES] overrun

On 2016-07-27 09:26, J. Bruce Fields wrote:
> On Wed, Jul 27, 2016 at 08:57:26AM +0900, Seiichi Ikarashi wrote:
>> Hi Bruce,
>>
>> On 2016-07-27 05:19, J. Bruce Fields wrote:
>>> Thanks for the report.
>>>
>>> On Tue, Jul 26, 2016 at 11:38:11AM +0900, Seiichi Ikarashi wrote:
>>>> If over-"RPCSVC_MAXPAGES" pages are sent from file system through pipe_buffer,
>>>> nfsd_splice_actor() corrupts struct svc_rqst and results in kernel panic. It
>>>> actually occurred with a parallel distributed file system. It needs boundary
>>>> checking.
>>>
>>> This check might be useful as defensive programming, but the bug was
>>> elsewhere.
>>
>> Yah, I think the main factor exists in file system and/or VFS splice sides.
>> But I also think that NFS should defend itself against overlimit pages
>> because the limit is decided by NFS/SUNRPC, not by file system and others.
>>
>>>
>>> In theory this should be prevented by the "maxcount" calculations in
>>> nfsd4_encode_read().
>>
>> The "maxcount" looks just limiting the read length from the file system.
>> Is my understanding correct?
>
> Right.
>
>>
>> And the number of pages provided from the file system is up to the file system.
>> The file system can split the read data into an arbitrary number of pages.
>
> Oh, so if we ask the filesystem for 3 bytes it might potentially return
> those in 3 separate pages? Is that at all legal?

I think the code actually permits such a split though I am not sure that
one-single-byte-per-page situation really happens. :-)
For example, on 4kB-page architecture, 1kB-data-per-page placement will result
in 4 times larger number of pages than expected.
I do not know whether it's legal or not.

I thought NFS needs to defend itself against such large numbers,
or, can somewhere in VFS splice merge such pages into a minimum set of pages?
I cannot find such code in VFS.

Opinions or suggestions?

Seiichi.

>
>>> What version of the kernel did you see this happen on? What was the
>>> client, and what was it doing? Any other hints on reproducing?
>>
>> It was a NFSv3 read access from a client of maybe-Linux-based system
>> against the server of RHEL6 system exporting a file system that I cannot
>> access its source code. Sorry for lack of info.
>
> OK, understood.
>
> --b.
>
>>
>>
>> Seiichi.
>>
>>
>>>
>>> --b.
>>>
>>>>
>>>> v2: Fix semicolon-missing bug.
>>>>
>>>> Signed-off-by: Seiichi Ikarashi <[email protected]>
>>>>
>>>> ---
>>>> fs/nfsd/vfs.c | 8 ++++++++
>>>> 1 file changed, 8 insertions(+)
>>>>
>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>> index 6fbd81e..43393f3 100644
>>>> --- a/fs/nfsd/vfs.c
>>>> +++ b/fs/nfsd/vfs.c
>>>> @@ -811,12 +811,20 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
>>>> size = sd->len;
>>>>
>>>> if (rqstp->rq_res.page_len == 0) {
>>>> + if (rqstp->rq_next_page > &rqstp->rq_pages[RPCSVC_MAXPAGES-1]) {
>>>> + WARN_ON(1);
>>>> + return -ENOMEM;
>>>> + }
>>>> get_page(page);
>>>> put_page(*rqstp->rq_next_page);
>>>> *(rqstp->rq_next_page++) = page;
>>>> rqstp->rq_res.page_base = buf->offset;
>>>> rqstp->rq_res.page_len = size;
>>>> } else if (page != pp[-1]) {
>>>> + if (rqstp->rq_next_page > &rqstp->rq_pages[RPCSVC_MAXPAGES-1]) {
>>>> + WARN_ON(1);
>>>> + return -ENOMEM;
>>>> + }
>>>> get_page(page);
>>>> if (*rqstp->rq_next_page)
>>>> put_page(*rqstp->rq_next_page);
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> .
>>>
> .
>


2016-07-28 18:25:12

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2] Prevent rqstp->rq_pages[RPCSVC_MAXPAGES] overrun

On Wed, Jul 27, 2016 at 10:06:07AM +0900, Seiichi Ikarashi wrote:
> On 2016-07-27 09:26, J. Bruce Fields wrote:
> > On Wed, Jul 27, 2016 at 08:57:26AM +0900, Seiichi Ikarashi wrote:
> >> Hi Bruce,
> >>
> >> On 2016-07-27 05:19, J. Bruce Fields wrote:
> >>> Thanks for the report.
> >>>
> >>> On Tue, Jul 26, 2016 at 11:38:11AM +0900, Seiichi Ikarashi wrote:
> >>>> If over-"RPCSVC_MAXPAGES" pages are sent from file system through pipe_buffer,
> >>>> nfsd_splice_actor() corrupts struct svc_rqst and results in kernel panic. It
> >>>> actually occurred with a parallel distributed file system. It needs boundary
> >>>> checking.
> >>>
> >>> This check might be useful as defensive programming, but the bug was
> >>> elsewhere.
> >>
> >> Yah, I think the main factor exists in file system and/or VFS splice sides.
> >> But I also think that NFS should defend itself against overlimit pages
> >> because the limit is decided by NFS/SUNRPC, not by file system and others.
> >>
> >>>
> >>> In theory this should be prevented by the "maxcount" calculations in
> >>> nfsd4_encode_read().
> >>
> >> The "maxcount" looks just limiting the read length from the file system.
> >> Is my understanding correct?
> >
> > Right.
> >
> >>
> >> And the number of pages provided from the file system is up to the file system.
> >> The file system can split the read data into an arbitrary number of pages.
> >
> > Oh, so if we ask the filesystem for 3 bytes it might potentially return
> > those in 3 separate pages? Is that at all legal?
>
> I think the code actually permits such a split though I am not sure that
> one-single-byte-per-page situation really happens. :-)
> For example, on 4kB-page architecture, 1kB-data-per-page placement will result
> in 4 times larger number of pages than expected.
> I do not know whether it's legal or not.
>
> I thought NFS needs to defend itself against such large numbers,
> or, can somewhere in VFS splice merge such pages into a minimum set of pages?
> I cannot find such code in VFS.
>
> Opinions or suggestions?

I don't know the splice interface well, but suspect the filesystem's in
the wrong here.

Checking for overflow of the page array isn't sufficient to catch any
such problems; if the filesystem gives us a smaller number of partial
pages, I'd guess (not having tested) that nfsd is just going to assume
the full page should be used, and end up sending uninitialized data back
to the client.

So I wouldn't necessarily be opposed to a simple check for a misbehaving
filesystem here, but if the required checks are more complicated and if
we've only ever seen it against an out-of-tree filesystem then I'm less
interested.

--b.

>
> Seiichi.
>
> >
> >>> What version of the kernel did you see this happen on? What was the
> >>> client, and what was it doing? Any other hints on reproducing?
> >>
> >> It was a NFSv3 read access from a client of maybe-Linux-based system
> >> against the server of RHEL6 system exporting a file system that I cannot
> >> access its source code. Sorry for lack of info.
> >
> > OK, understood.
> >
> > --b.
> >
> >>
> >>
> >> Seiichi.
> >>
> >>
> >>>
> >>> --b.
> >>>
> >>>>
> >>>> v2: Fix semicolon-missing bug.
> >>>>
> >>>> Signed-off-by: Seiichi Ikarashi <[email protected]>
> >>>>
> >>>> ---
> >>>> fs/nfsd/vfs.c | 8 ++++++++
> >>>> 1 file changed, 8 insertions(+)
> >>>>
> >>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> >>>> index 6fbd81e..43393f3 100644
> >>>> --- a/fs/nfsd/vfs.c
> >>>> +++ b/fs/nfsd/vfs.c
> >>>> @@ -811,12 +811,20 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> >>>> size = sd->len;
> >>>>
> >>>> if (rqstp->rq_res.page_len == 0) {
> >>>> + if (rqstp->rq_next_page > &rqstp->rq_pages[RPCSVC_MAXPAGES-1]) {
> >>>> + WARN_ON(1);
> >>>> + return -ENOMEM;
> >>>> + }
> >>>> get_page(page);
> >>>> put_page(*rqstp->rq_next_page);
> >>>> *(rqstp->rq_next_page++) = page;
> >>>> rqstp->rq_res.page_base = buf->offset;
> >>>> rqstp->rq_res.page_len = size;
> >>>> } else if (page != pp[-1]) {
> >>>> + if (rqstp->rq_next_page > &rqstp->rq_pages[RPCSVC_MAXPAGES-1]) {
> >>>> + WARN_ON(1);
> >>>> + return -ENOMEM;
> >>>> + }
> >>>> get_page(page);
> >>>> if (*rqstp->rq_next_page)
> >>>> put_page(*rqstp->rq_next_page);
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >>>> the body of a message to [email protected]
> >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>> .
> >>>
> > .
> >

2016-08-01 10:14:38

by Seiichi Ikarashi

[permalink] [raw]
Subject: Re: [PATCH v2] Prevent rqstp->rq_pages[RPCSVC_MAXPAGES] overrun

On 2016-07-29 03:25, J. Bruce Fields wrote:
> On Wed, Jul 27, 2016 at 10:06:07AM +0900, Seiichi Ikarashi wrote:
>> On 2016-07-27 09:26, J. Bruce Fields wrote:
>>> On Wed, Jul 27, 2016 at 08:57:26AM +0900, Seiichi Ikarashi wrote:
>>>> Hi Bruce,
>>>>
>>>> On 2016-07-27 05:19, J. Bruce Fields wrote:
>>>>> Thanks for the report.
>>>>>
>>>>> On Tue, Jul 26, 2016 at 11:38:11AM +0900, Seiichi Ikarashi wrote:
>>>>>> If over-"RPCSVC_MAXPAGES" pages are sent from file system through pipe_buffer,
>>>>>> nfsd_splice_actor() corrupts struct svc_rqst and results in kernel panic. It
>>>>>> actually occurred with a parallel distributed file system. It needs boundary
>>>>>> checking.
>>>>>
>>>>> This check might be useful as defensive programming, but the bug was
>>>>> elsewhere.
>>>>
>>>> Yah, I think the main factor exists in file system and/or VFS splice sides.
>>>> But I also think that NFS should defend itself against overlimit pages
>>>> because the limit is decided by NFS/SUNRPC, not by file system and others.
>>>>
>>>>>
>>>>> In theory this should be prevented by the "maxcount" calculations in
>>>>> nfsd4_encode_read().
>>>>
>>>> The "maxcount" looks just limiting the read length from the file system.
>>>> Is my understanding correct?
>>>
>>> Right.
>>>
>>>>
>>>> And the number of pages provided from the file system is up to the file system.
>>>> The file system can split the read data into an arbitrary number of pages.
>>>
>>> Oh, so if we ask the filesystem for 3 bytes it might potentially return
>>> those in 3 separate pages? Is that at all legal?
>>
>> I think the code actually permits such a split though I am not sure that
>> one-single-byte-per-page situation really happens. :-)
>> For example, on 4kB-page architecture, 1kB-data-per-page placement will result
>> in 4 times larger number of pages than expected.
>> I do not know whether it's legal or not.
>>
>> I thought NFS needs to defend itself against such large numbers,
>> or, can somewhere in VFS splice merge such pages into a minimum set of pages?
>> I cannot find such code in VFS.
>>
>> Opinions or suggestions?
>
> I don't know the splice interface well, but suspect the filesystem's in
> the wrong here.
>
> Checking for overflow of the page array isn't sufficient to catch any
> such problems; if the filesystem gives us a smaller number of partial
> pages, I'd guess (not having tested) that nfsd is just going to assume
> the full page should be used, and end up sending uninitialized data back
> to the client.

I see. Now I understood why nfsd_splice_actor() ignores buf->offset if
rqstp->rq_res.page_len != 0. It premises consecutive, non-sparse data
filling in multi-pages through pipe_buffer.

I agree that checking for the overflow only is definitely insufficient.

>
> So I wouldn't necessarily be opposed to a simple check for a misbehaving
> filesystem here, but if the required checks are more complicated and if
> we've only ever seen it against an out-of-tree filesystem then I'm less
> interested.

Yah, it was an independent file system case.

Seeing the linus tree, most file systems are just using generic_file_splice_read()
as splice_read method. So the call sequence will be...

...
=> splice_direct_to_actor()
=> do_splice_to()
=> generic_file_splice_read()
=> __generic_file_splice_read()

Here __generic_file_splice_read() looks permitting partial pages.
So it might be possible theoretically with one of in-tree file systems,
but I'm no sure...

Any comment from file system guys are welcome :-)

Seiichi.