Sorry resending again. My mailer settings thrashed my earlier email.
Hi,
A kernel crash is observed on 3.1rc4 kernel when HIGHMEM is enabled and
kernel is booted with a NFS on omap4430sdp. The issue happens in the below
scenario.
In file net/sunrpc/xprtsock.c,
static int xs_send_pagedata( xxx, struct xdr_buf *xdr, ..)
{
Struct page **ppage;
....
.....
ppage = xdr->pages + (base >> PAGE_SHIFT);
....
err = sock->ops->sendpage(sock, *ppage, base, len, flags);
...
}
1) In the above piece of code, the *ppage value from ops->sendpage
function is finally passed on to Kmap by the lower level code to
get the virtual address of the page.
2) In some corner cases the value of *ppage pointer is NULL.
3) When highmem is enabled and a NULL pointer is passed to
Kmap, then kmap finally crashes. But in the case when highmem
is disabled, then kmap returns a junk value for NULL pointer.
Highmem Enabled , kmap( NULL )-----> kernel crashes.
Highmem disabled, kmap( NULL )-----> junk value is returned.
Subsequently this message is observed on
the console.
"RPC call returned error 14"
4) Now the question is why is the value of *ppage = NULL is passed
from the above piece of code to lower layers.
Should that not have handled *ppage = NULL? and kmap should not
have received a NULL pointer?
Thanks,
Sricharan
Hi Trond,
[....]
>> 1) In the above piece of code, the *ppage value from ops-
>>sendpage
>> function is finally passed on to Kmap by the lower level
code
>to
>> get the virtual address of the page.
>> 2) In some corner cases the value of *ppage pointer is NULL.
>> 3) When highmem is enabled and a NULL pointer is passed to
>> Kmap, then kmap finally crashes. But in the case when
highmem
>> is disabled, then kmap returns a junk value for NULL
pointer.
>>
>> Highmem Enabled , kmap( NULL )-----> kernel crashes.
>>
>> Highmem disabled, kmap( NULL )-----> junk value is returned.
>> Subsequently this message is observed on
>> the console.
>>
>> "RPC call returned error 14"
>>
>> 4) Now the question is why is the value of *ppage = NULL is
>passed
>> from the above piece of code to lower layers.
>> Should that not have handled *ppage = NULL? and kmap should
>not
>> have received a NULL pointer?
>
>I wouldn't expect *ppage to be NULL under any circumstances, so I'm
>really curious as to what is happening here.
>
>Could you perhaps add a printk() to that section of code to print out
>the values of 'xdr->page_base', 'xdr->page_len', 'len' and 'remainder'
>in the case where *ppage == NULL?
>
Thanks for the response.
I added a printk just before err = sock->ops->sendpage(sock, *ppage, base,
len, flags);
So here are values when *ppage is NULL.
xdr->page_base= 0xCE9 xdr->page_len=0x400 len=0xE9 remainder=0x0.
Thanks,
Sricharan
On Mon, 2011-09-12 at 11:46 +0530, Sricharan R wrote:
> Hi Trond,
> [....]
>
> >> 1) In the above piece of code, the *ppage value from ops-
> >>sendpage
> >> function is finally passed on to Kmap by the lower level
> code
> >to
> >> get the virtual address of the page.
> >> 2) In some corner cases the value of *ppage pointer is NULL.
> >> 3) When highmem is enabled and a NULL pointer is passed to
> >> Kmap, then kmap finally crashes. But in the case when
> highmem
> >> is disabled, then kmap returns a junk value for NULL
> pointer.
> >>
> >> Highmem Enabled , kmap( NULL )-----> kernel crashes.
> >>
> >> Highmem disabled, kmap( NULL )-----> junk value is returned.
> >> Subsequently this message is observed on
> >> the console.
> >>
> >> "RPC call returned error 14"
> >>
> >> 4) Now the question is why is the value of *ppage = NULL is
> >passed
> >> from the above piece of code to lower layers.
> >> Should that not have handled *ppage = NULL? and kmap should
> >not
> >> have received a NULL pointer?
> >
> >I wouldn't expect *ppage to be NULL under any circumstances, so I'm
> >really curious as to what is happening here.
> >
> >Could you perhaps add a printk() to that section of code to print out
> >the values of 'xdr->page_base', 'xdr->page_len', 'len' and 'remainder'
> >in the case where *ppage == NULL?
> >
>
>
> Thanks for the response.
> I added a printk just before err = sock->ops->sendpage(sock, *ppage, base,
> len, flags);
> So here are values when *ppage is NULL.
>
> xdr->page_base= 0xCE9 xdr->page_len=0x400 len=0xE9 remainder=0x0.
>
> Thanks,
> Sricharan
Can you please tell me what the mount options are for this setup?
Are you running any applications that might be using O_DIRECT writes?
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
[email protected]
http://www.netapp.com
On Mon, 2011-09-12 at 10:41 -0400, Trond Myklebust wrote:
> On Mon, 2011-09-12 at 11:46 +0530, Sricharan R wrote:
> > Thanks for the response.
> > I added a printk just before err = sock->ops->sendpage(sock, *ppage, base,
> > len, flags);
> > So here are values when *ppage is NULL.
> >
> > xdr->page_base= 0xCE9 xdr->page_len=0x400 len=0xE9 remainder=0x0.
> >
> > Thanks,
> > Sricharan
>
> Can you please tell me what the mount options are for this setup?
I'm guessing you've got wsize=1024, in which case, can you please try
the following patch?
Cheers
Trond
8<--------------------------------------------------------------------------
[..]
>>
>> Can you please tell me what the mount options are for this setup?
>
>I'm guessing you've got wsize=1024, in which case, can you please try
>the following patch?
>
The mount options for nfs is rw.
Yes, in my setup wsize=1024 when the issue happened.
I tried your patch and I was not able to see the issue after that,
where as in the other case the issue happened quite frequently.
So I think that the patch fixes the issue.
Thanks a lot for your help.
>Cheers
> Trond
>8<-----------------------------------------------------------------------
--
>-
>From 7b4a9c76b55dd254431902552528137a2ea5e55d Mon Sep 17 00:00:00 2001
>From: Trond Myklebust <[email protected]>
>Date: Mon, 12 Sep 2011 11:47:53 -0400
>Subject: [PATCH] NFS: Fix a typo in nfs_flush_multi
>
>Fix a typo which causes an Oops in the RPC layer, when using wsize < 4k.
>
>Signed-off-by: Trond Myklebust <[email protected]>
>---
> fs/nfs/write.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
>diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>index b39b37f..c9bd2a6 100644
>--- a/fs/nfs/write.c
>+++ b/fs/nfs/write.c
>@@ -958,7 +958,7 @@ static int nfs_flush_multi(struct
nfs_pageio_descriptor
>*desc, struct list_head
> if (!data)
> goto out_bad;
> data->pagevec[0] = page;
>- nfs_write_rpcsetup(req, data, wsize, offset,
desc->pg_ioflags);
>+ nfs_write_rpcsetup(req, data, len, offset,
desc->pg_ioflags);
> list_add(&data->list, res);
> requests++;
> nbytes -= len;
>--
>1.7.6
>
>
>
>--
>Trond Myklebust
>Linux NFS client maintainer
>
>NetApp
>[email protected]
>http://www.netapp.com
On Fri, 2011-09-09 at 18:40 +0530, R, Sricharan wrote:
> Sorry resending again. My mailer settings thrashed my earlier email.
>
> Hi,
> A kernel crash is observed on 3.1rc4 kernel when HIGHMEM is enabled and
> kernel is booted with a NFS on omap4430sdp. The issue happens in the below
> scenario.
>
> In file net/sunrpc/xprtsock.c,
> static int xs_send_pagedata( xxx, struct xdr_buf *xdr, ..)
> {
> Struct page **ppage;
> ....
> .....
> ppage = xdr->pages + (base >> PAGE_SHIFT);
> ....
> err = sock->ops->sendpage(sock, *ppage, base, len, flags);
>
> ...
> }
>
> 1) In the above piece of code, the *ppage value from ops->sendpage
> function is finally passed on to Kmap by the lower level code to
> get the virtual address of the page.
> 2) In some corner cases the value of *ppage pointer is NULL.
> 3) When highmem is enabled and a NULL pointer is passed to
> Kmap, then kmap finally crashes. But in the case when highmem
> is disabled, then kmap returns a junk value for NULL pointer.
>
> Highmem Enabled , kmap( NULL )-----> kernel crashes.
>
> Highmem disabled, kmap( NULL )-----> junk value is returned.
> Subsequently this message is observed on
> the console.
>
> "RPC call returned error 14"
>
> 4) Now the question is why is the value of *ppage = NULL is passed
> from the above piece of code to lower layers.
> Should that not have handled *ppage = NULL? and kmap should not
> have received a NULL pointer?
I wouldn't expect *ppage to be NULL under any circumstances, so I'm
really curious as to what is happening here.
Could you perhaps add a printk() to that section of code to print out
the values of 'xdr->page_base', 'xdr->page_len', 'len' and 'remainder'
in the case where *ppage == NULL?
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
[email protected]
http://www.netapp.com