2001-10-01 23:40:30

by H. Peter Anvin

[permalink] [raw]
Subject: NFSv3 and linux-2.4.10-ac3 => oops

Hello everyone,

I have a reproducible (and rather quick) oops on a system running
linux-2.4.10-ac3, which seems to be NFS (v3) related; although
ksymoops core dumps when I try to use it, I have manually decoded
the dump to indicate that it happens in rwsem_down_read_failed
called from nfs_file_wite. Rather than posting too much here,
I have put as much information as I have been able to gather at:

ftp://ftp.zytor.com/pub/hpa/oops/

This includes the configuration, System.map, oops text etc.


2001-10-02 09:40:38

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFSv3 and linux-2.4.10-ac3 => oops

>>>>> " " == H Peter Anvin <[email protected]> writes:

> Hello everyone, I have a reproducible (and rather quick) oops
> on a system running linux-2.4.10-ac3, which seems to be NFS
> (v3) related; although ksymoops core dumps when I try to use
> it, I have manually decoded the dump to indicate that it
> happens in rwsem_down_read_failed called from nfs_file_wite.
> Rather than posting too much here, I have put as much
> information as I have been able to gather at:

> ftp://ftp.zytor.com/pub/hpa/oops/

I'm trying to look at this, but it seems a hopeless mess: there are no
calls to any read/write semaphore routines in the NFS code.

AFAICS the second stack return point corresponds to the call to
generic_file_write() in nfs_file_write(), so I'd guess that the Oops
is actually happening somewhere there...

Hmm... Looking at the code in generic_file_write(), I see that Alan
hasn't merged in the kmap() stuff in generic_file_write()from
Linus. At the same time, the nfs_prepare_write() seems to have been
synced with Linus, and so the kmap() that used to be there has
disappeared.

As your config indicates that you *are* using CONFIG_HIGHMEM4G,
perhaps one ought to start with a patch that fixes the obvious bug (in
the hope that it'll at least clean up the next Oops)...

Cheers,
Trond

--- linux-2.4.10-hpa/fs/nfs/file.c.orig Sun Sep 23 18:48:01 2001
+++ linux-2.4.10-hpa/fs/nfs/file.c Tue Oct 2 11:33:43 2001
@@ -155,7 +155,12 @@
*/
static int nfs_prepare_write(struct file *file, struct page *page, unsigned offset, unsigned to)
{
- return nfs_flush_incompatible(file, page);
+ int status;
+ kmap(page);
+ status = nfs_flush_incompatible(file, page);
+ if (status)
+ kunmap(page);
+ return status;
}

static int nfs_commit_write(struct file *file, struct page *page, unsigned offset, unsigned to)

2001-10-02 11:32:49

by Matt Bernstein

[permalink] [raw]
Subject: Re: NFSv3 and linux-2.4.10-ac3 => oops

I wonder if this is related to oopses I sent in in the last two days?
We're running 4GB setups with NFSv3 client and server on our fileservers,
and the oopses might (don't really have strong correlation evidence yet)
be related to when our fileservers push online backups to cheaper NFS
servers (running the same kernel based on 2.4.9-ac10). Is there a last
known good kernel I can try on my production systems while I try to
reproduce the problem on smaller boxes? Or would you like me to try your
patch?

Matt

At 11:40 +0200 Trond Myklebust wrote:

>>>>>> " " == H Peter Anvin <[email protected]> writes:
>
> > Hello everyone, I have a reproducible (and rather quick) oops
> > on a system running linux-2.4.10-ac3, which seems to be NFS
> > (v3) related; although ksymoops core dumps when I try to use
[snip]
> > ftp://ftp.zytor.com/pub/hpa/oops/
>
>I'm trying to look at this, but it seems a hopeless mess: there are no
>calls to any read/write semaphore routines in the NFS code.
>
>AFAICS the second stack return point corresponds to the call to
>generic_file_write() in nfs_file_write(), so I'd guess that the Oops
>is actually happening somewhere there...
>
>Hmm... Looking at the code in generic_file_write(), I see that Alan
>hasn't merged in the kmap() stuff in generic_file_write()from
>Linus. At the same time, the nfs_prepare_write() seems to have been
>synced with Linus, and so the kmap() that used to be there has
>disappeared.

2001-10-02 12:03:47

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFSv3 and linux-2.4.10-ac3 => oops

>>>>> " " == Matt Bernstein <[email protected]> writes:

> I wonder if this is related to oopses I sent in in the last two
> days? We're running 4GB setups with NFSv3 client and server on
> our fileservers, and the oopses might (don't really have strong
> correlation evidence yet) be related to when our fileservers
> push online backups to cheaper NFS servers (running the same
> kernel based on 2.4.9-ac10). Is there a last known good kernel
> I can try on my production systems while I try to reproduce the
> problem on smaller boxes? Or would you like me to try your
> patch?

Linus changed nfs_prepare_write() in his tree around 2.4.10-pre5. From
what I can see, Alan merged that particular patch into 2.4.9-ac11 (but
without merging in the related changes to linux/mm/filemap.c).

Argh. I see that in the patch I put out earlier today, I forgot to
also revert the removal of the kunmap() in nfs_commit_write() (sorry -
my coffee was particularly weak this morning).

Please apply the following patch to the 'ac' tree instead.

People who use Linus' tree should *not* apply this patch!!!!!

Cheers,
Trond

diff -u --recursive --new-file linux-2.4.10-reclaim/fs/nfs/file.c linux-2.4.10-ac4/fs/nfs/file.c
--- linux-2.4.10-reclaim/fs/nfs/file.c Sun Sep 23 18:48:01 2001
+++ linux-2.4.10-ac4/fs/nfs/file.c Tue Oct 2 13:40:58 2001
@@ -155,7 +155,12 @@
*/
static int nfs_prepare_write(struct file *file, struct page *page, unsigned offset, unsigned to)
{
- return nfs_flush_incompatible(file, page);
+ int status;
+ kmap(page);
+ status = nfs_flush_incompatible(file, page);
+ if (status)
+ kunmap(page);
+ return status;
}

static int nfs_commit_write(struct file *file, struct page *page, unsigned offset, unsigned to)
@@ -164,6 +169,7 @@
loff_t pos = ((loff_t)page->index<<PAGE_CACHE_SHIFT) + to;
struct inode *inode = page->mapping->host;

+ kunmap(page);
lock_kernel();
status = nfs_updatepage(file, page, offset, to-offset);
unlock_kernel();

2001-10-02 13:42:59

by Alan

[permalink] [raw]
Subject: Re: NFSv3 and linux-2.4.10-ac3 => oops

> I wonder if this is related to oopses I sent in in the last two days?
> We're running 4GB setups with NFSv3 client and server on our fileservers,
> and the oopses might (don't really have strong correlation evidence yet)
> be related to when our fileservers push online backups to cheaper NFS
> servers (running the same kernel based on 2.4.9-ac10). Is there a last
> known good kernel I can try on my production systems while I try to
> reproduce the problem on smaller boxes? Or would you like me to try your
> patch?

Are these oopses new as of the 2.4.10 based tree. If so do you see them
with 2.4.10-ac3 ?

Right now we have a sort of bug candidate set that is

VM NFS LOCKING
2.4.9-ac10 old old old
2.4.9-ac16 new old old
2.4.9-ac18 new old half-way
2.4.10-ac3 new new new

that may help deduce which problem

Alan

2001-10-02 13:49:49

by Alan

[permalink] [raw]
Subject: Re: NFSv3 and linux-2.4.10-ac3 => oops

> what I can see, Alan merged that particular patch into 2.4.9-ac11 (but
> without merging in the related changes to linux/mm/filemap.c).

Ok its probably better I merge the related mm/filemap.c changes if someone
has the relevant bits handy. That helps to keep the differences down

2001-10-02 14:02:03

by Matt Bernstein

[permalink] [raw]
Subject: Re: NFSv3 and linux-2.4.10-ac3 => oops

At 14:47 +0100 Alan Cox wrote:

>> I wonder if this is related to oopses I sent in in the last two days?
[snip]
>
>Are these oopses new as of the 2.4.10 based tree. If so do you see them
>with 2.4.10-ac3 ?

Mine were from 2.4.9-ac10 + ext3-0.9.9 + ext3 speedup patch (which is in
0.9.10) + "experimental VM patch" (see the ext3 for 2.4 page) + jfs-1.0.4
(compiled with gcc 2.96-85, romfs initrd, everything possible as modules)

I've booted two of our servers into 2.4.9-ac18 compiled with egcs-1.1.2
(so far without Trond's patches) and will report anything odd.

Incidentally a third server on my 2.4.9-ac10 things has oopsed (output
below). What these three servers have in common is that they're all using
ICP-Vortex gdth raid arrays, and no IDE. I have four or five other setups
with the exact same kernel (well, two of them compiled for UP Athlon
rather than SMP Coppermine) with IDE root and further SCSI partitions
(some aic7xxx, some gdth) which have all been very stable. We haven't
ruled out a cabling/termination problem, but it's a bit spooky.

Thanks for the responses :)

Matt


ksymoops 2.4.1 on i686 2.4.9-ac10-jfs. Options used
-V (default)
-K (specified)
-L (specified)
-o /lib/modules/2.4.9-ac10-jfs/ (default)
-m /boot/System.map-2.4.9-ac10-jfs (default)

No modules in ksyms, skipping objects
Unable to handle kernel paging request at virtual address 756f6a00
756f6a00
*pde = 00000000
Oops: 0000
CPU: 1
EIP: 0010:[<756f6a00>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010206
eax: 756f6a00 ebx: c7e219cc ecx: d01fc594 edx: d01fc594
esi: c7e219b4 edi: d01fc584 ebp: ffff4909 esp: c1969f68
ds: 0018 es: 0018 ss: 0018
Process kswapd (pid: 5, stackpage=c1969000)
Stack: c015a271 c7e219b4 d01fc584 c022c940 00000206 ffffffff 00003044
c11fa670
c11fa670 00000000 00000001 000000c0 00000001 c0231d60 0008e000
c015a891
00000000 c0139306 00000000 000000c0 000000c0 00000000 c1968000
ffffffff
Call Trace: [<c015a271>] [<c015a891>] [<c0139306>] [<c01393ae>]
[<c0105000>]
[<c0105000>] [<c0105926>] [<c0139340>]
Code: Bad EIP value.

>>EIP; 756f6a00 Before first symbol <=====
Trace; c015a271 <prune_dcache+141/270>
Trace; c015a891 <shrink_dcache_memory+21/40>
Trace; c0139306 <do_try_to_free_pages+26/60>
Trace; c01393ae <kswapd+6e/f0>
Trace; c0105000 <_stext+0/0>
Trace; c0105000 <_stext+0/0>
Trace; c0105926 <kernel_thread+26/30>
Trace; c0139340 <kswapd+0/f0>

<1>Unable to handle kernel paging request at virtual address 756f6a00
756f6a00
*pde = 00000000
Oops: 0000
CPU: 1
EIP: 0010:[<756f6a00>]
EFLAGS: 00010206
eax: 756f6a00 ebx: de844de0 ecx: c0819e54 edx: c0819e54
esi: de844dc8 edi: c0819e44 ebp: 00000000 esp: cc7c3e74
ds: 0018 es: 0018 ss: 0018
Process bonnie++ (pid: 17138, stackpage=cc7c3000)
Stack: c015a271 de844dc8 c0819e44 00000082 c01383ba c10143c0 00000082
c10143dc
c1509d24 00000000 00000000 000000d2 00015ec2 00000000 000000d2
c015a891
00000000 c0139306 00000000 000000d2 000000d2 00000001 cc7c2000
00000010
Call Trace: [<c015a271>] [<c01383ba>] [<c015a891>] [<c0139306>]
[<c0139488>]
[<c013a13e>] [<c0131f0b>] [<c0109437>] [<e099ae42>] [<c0142656>]
[<c01128bc>]
[<c010772b>]
Code: Bad EIP value.

>>EIP; 756f6a00 Before first symbol <=====
Trace; c015a271 <prune_dcache+141/270>
Trace; c01383ba <try_to_release_page+3a/60>
Trace; c015a891 <shrink_dcache_memory+21/40>
Trace; c0139306 <do_try_to_free_pages+26/60>
Trace; c0139488 <try_to_free_pages+28/40>
Trace; c013a13e <__alloc_pages+1be/250>
Trace; c0131f0b <generic_file_write+35b/610>
Trace; c0109437 <do_IRQ+1a7/1c0>
Trace; e099ae42 <END_OF_CODE+206ce85a/????>
Trace; c0142656 <sys_write+96/d0>
Trace; c01128bc <smp_apic_timer_interrupt+ec/110>
Trace; c010772b <system_call+33/38>



2001-10-02 14:04:13

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFSv3 and linux-2.4.10-ac3 => oops

>>>>> " " == Alan Cox <[email protected]> writes:

>> what I can see, Alan merged that particular patch into
>> 2.4.9-ac11 (but without merging in the related changes to
>> linux/mm/filemap.c).

> Ok its probably better I merge the related mm/filemap.c changes
> if someone has the relevant bits handy. That helps to keep the
> differences down

The following ought to be sufficient.

Cheers,
Trond

--- linux-2.4.10-ac/mm/filemap.c Tue Oct 2 15:53:04 2001
+++ linux-2.4.10-new/mm/filemap.c Tue Oct 2 15:56:29 2001
@@ -2673,10 +2673,10 @@
PAGE_BUG(page);
}

+ kaddr = kmap(page);
status = mapping->a_ops->prepare_write(file, page, offset, offset+bytes);
if (status)
goto sync_failure;
- kaddr = page_address(page);
status = __copy_from_user(kaddr+offset, buf, bytes);
flush_dcache_page(page);
if (status) {
@@ -2695,6 +2695,7 @@
buf += status;
}
unlock:
+ kunmap(page);
/* Mark it unlocked again and drop the page.. */
UnlockPage(page);
if (deactivate)
@@ -2728,9 +2729,9 @@
fail_write:
status = -EFAULT;
ClearPageUptodate(page);
- kunmap(page);
goto unlock;
sync_failure:
+ kunmap(page);
UnlockPage(page);
deactivate_page(page);
page_cache_release(page);