2005-10-12 23:02:58

by Gabriel A. Devenyi

[permalink] [raw]
Subject: [OOPS] nfsv4 in linux 2.6.13 (-ck7)

This oops seems to occur during heavy i/o load over nfsv4.

nfs-utils version 1.0.7

OOps follows, what other information is needed?

[kernel] Unable to handle kernel paging request at 0000000000100108 RIP:
[kernel] <ffffffff80185e98>{generic_drop_inode+56}
[kernel] PGD 34e3b067 PUD 34e68067 PMD 0
[kernel] CPU 0
[kernel] Modules linked in: nvidia snd_seq_midi snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_midi_event sn
d_seq snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd
[kernel] Pid: 179, comm: kswapd0 Tainted: P 2.6.13-ck7
[kernel] RIP: 0010:[<ffffffff80185e98>] <ffffffff80185e98>{generic_drop_inode+56}
[kernel] RSP: 0018:ffff81003fcd7b68 EFLAGS: 00010246
[kernel] RAX: 0000000000100100 RBX: ffff81001a58c950 RCX: 0000000000200200
[kernel] RDX: ffff81001a58c960 RSI: ffff81003eb84000 RDI: ffff81001a58c950
[kernel] RBP: ffff81001a58c950 R08: 00000000fffffffa R09: ffff81001a58ca68
[kernel] R10: 0000000000000001 R11: ffffffff80185e60 R12: 0000000000000000
[kernel] R13: ffff81001a58c7d0 R14: ffff81001a58c860 R15: ffff81003f1f5200
[kernel] FS: 0000000040800960(0000) GS:ffffffff80494800(0000) knlGS:0000000056160040
[kernel] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[kernel] CR2: 0000000000100108 CR3: 0000000034de7000 CR4: 00000000000006e0
[kernel] Process kswapd0 (pid: 179, threadinfo ffff81003fcd6000, task ffff81003fcb2760)
[kernel] Stack: ffff81003f1f5c00 ffffffff801d7a25 00000001803e8238 ffff81003fcd7c18
[kernel] ffffffffffffffff ffff81003fcd7c18 ffff81003fcd7c00 ffff81001a58c938
[kernel] 0000000000000000 0000000000000000
[kernel] Call Trace:<ffffffff801d7a25>{__nfs_revalidate_inode+261} <ffffffff8014e5df>{find_get_pages_tag+31}
[kernel] <ffffffff8015781a>{pagevec_lookup_tag+26} <ffffffff8014e00e>{wait_on_page_writeback_range+206}
[kernel] <ffffffff801f11ba>{nfs_do_return_delegation+42} <ffffffff801f12e5>{nfs_inode_return_delegation+197}
[kernel] <ffffffff801d8a10>{nfs4_clear_inode+32} <ffffffff80184cfe>{clear_inode+158}
[kernel] <ffffffff8018594e>{dispose_list+94} <ffffffff80185b82>{shrink_icache_memory+434}
[kernel] <ffffffff8015806b>{shrink_slab+219} <ffffffff80159517>{balance_pgdat+695}
[kernel] <ffffffff801597a8>{kswapd+312} <ffffffff80143b30>{autoremove_wake_function+0}
[kernel] <ffffffff80143b30>{autoremove_wake_function+0} <ffffffff8010f30e>{child_rip+8}
[kernel] <ffffffff80159670>{kswapd+0} <ffffffff8010f306>{child_rip+0}
[kernel]
[kernel] Code: 48 89 48 08 48 89 01 48 8b 05 aa 43 26 00 48 89 50 08 48 89
[kernel] RIP <ffffffff80185e98>{generic_drop_inode+56} RSP <ffff81003fcd7b68>


--
Gabriel A. Devenyi
[email protected]


2005-10-12 23:24:29

by Chris Wright

[permalink] [raw]
Subject: Re: [OOPS] nfsv4 in linux 2.6.13 (-ck7)

* Gabriel A. Devenyi ([email protected]) wrote:
> This oops seems to occur during heavy i/o load over nfsv4.
>
> [kernel] Unable to handle kernel paging request at 0000000000100108 RIP:
> [kernel] <ffffffff80185e98>{generic_drop_inode+56}

There have been a couple recent reports of this, and a fix is in the works.

See the recent thread here:

http://lkml.org/lkml/2005/9/25/44

> [kernel] Modules linked in: nvidia
^^^^^^
> [kernel] Pid: 179, comm: kswapd0 Tainted: P 2.6.13-ck7

Tainted kernel, when sending bug reports please be sure bug happens
w/out tainted kernel.

thanks,
-chris

2005-10-12 23:27:09

by Gabriel A. Devenyi

[permalink] [raw]
Subject: Re: [OOPS] nfsv4 in linux 2.6.13 (-ck7)

On October 12, 2005 19:24, Chris Wright wrote:
> * Gabriel A. Devenyi ([email protected]) wrote:
> > This oops seems to occur during heavy i/o load over nfsv4.
> >
> > [kernel] Unable to handle kernel paging request at 0000000000100108 RIP:
> > [kernel] <ffffffff80185e98>{generic_drop_inode+56}
>
> There have been a couple recent reports of this, and a fix is in the works.
>
> See the recent thread here:
>
> http://lkml.org/lkml/2005/9/25/44
>
> > [kernel] Modules linked in: nvidia
> ^^^^^^
> > [kernel] Pid: 179, comm: kswapd0 Tainted: P 2.6.13-ck7
>
> Tainted kernel, when sending bug reports please be sure bug happens
> w/out tainted kernel.

Of course, my apologies, however, this is a fs error, is it even conceivable that something such as the nvidia kernel driver could affect this?


> thanks,
> -chris
>
>

--
Gabriel A. Devenyi
[email protected]

2005-10-12 23:32:07

by Chris Wright

[permalink] [raw]
Subject: Re: [OOPS] nfsv4 in linux 2.6.13 (-ck7)

* Gabriel A. Devenyi ([email protected]) wrote:
> Of course, my apologies, however, this is a fs error, is it even
> conceivable that something such as the nvidia kernel driver could
> affect this?

In this case it's not very likely since others are seeing same problem
under load. However, a binary module can corrupt any kernel memory.
So as a general rule all bets are off with a binary module loaded.

thanks,
-chris

2005-10-12 23:37:44

by Gabriel A. Devenyi

[permalink] [raw]
Subject: Re: [OOPS] nfsv4 in linux 2.6.13 (-ck7)

On October 12, 2005 19:31, Chris Wright wrote:
> In this case it's not very likely since others are seeing same problem
> under load. However, a binary module can corrupt any kernel memory.
> So as a general rule all bets are off with a binary module loaded.

Thanks, I'll keep that in mind for next time. With regards to the patch in the other thread,
should I try and patch the client, the server or both?

--
Gabriel A. Devenyi
[email protected]

2005-10-12 23:56:34

by Chris Wright

[permalink] [raw]
Subject: Re: [OOPS] nfsv4 in linux 2.6.13 (-ck7)

* Gabriel A. Devenyi ([email protected]) wrote:

> Thanks, I'll keep that in mind for next time. With regards to the
> patch in the other thread, should I try and patch the client, the
> server or both?

Client side AFAIK. May want to check with the nfs folks to see if
they've got any specific testing they'd find useful.

thanks,
-chris

2005-10-14 14:05:15

by Gabriel A. Devenyi

[permalink] [raw]
Subject: Re: [OOPS] nfsv4 in linux 2.6.13 (-ck7)

Chris Wright wrote:
> Client side AFAIK. May want to check with the nfs folks to see if
> they've got any specific testing they'd find useful.

Well the patch seems to have cleared this problem up, do you happen to
know where the NFS folks can be located so I can provide further
testing/feedback? Thanks.


--
Gabriel A. Devenyi
[email protected]

2005-10-14 15:50:23

by Trond Myklebust

[permalink] [raw]
Subject: Re: [OOPS] nfsv4 in linux 2.6.13 (-ck7)

NFS: Fix Oopsable/unnecessary i_count manipulations in nfs_wait_on_inode()

Oopsable since nfs_wait_on_inode() can get called as part of iput_final().

Unnecessary since the caller had better be damned sure that the inode won't
disappear from underneath it anyway.

Signed-off-by: Trond Myklebust <[email protected]>
---
inode.c | 2 --
1 files changed, 2 deletions(-)

Index: linux-2.6.14-rc4/fs/nfs/inode.c
===================================================================
--- linux-2.6.14-rc4.orig/fs/nfs/inode.c
+++ linux-2.6.14-rc4/fs/nfs/inode.c
@@ -877,12 +877,10 @@ static int nfs_wait_on_inode(struct inod
sigset_t oldmask;
int error;

- atomic_inc(&inode->i_count);
rpc_clnt_sigmask(clnt, &oldmask);
error = wait_on_bit_lock(&nfsi->flags, NFS_INO_REVALIDATING,
nfs_wait_schedule, TASK_INTERRUPTIBLE);
rpc_clnt_sigunmask(clnt, &oldmask);
- iput(inode);

return error;
}


Attachments:
linux-2.6.14-00-fix_iput.dif (919.00 B)

2005-10-14 18:13:58

by Chris Wright

[permalink] [raw]
Subject: Re: [OOPS] nfsv4 in linux 2.6.13 (-ck7)

* Gabriel A. Devenyi ([email protected]) wrote:
> Chris Wright wrote:
> >Client side AFAIK. May want to check with the nfs folks to see if
> >they've got any specific testing they'd find useful.
>
> Well the patch seems to have cleared this problem up, do you happen to
> know where the NFS folks can be located so I can provide further
> testing/feedback? Thanks.

They're at [email protected]

thanks,
-chris

2005-10-15 17:23:53

by Gabriel A. Devenyi

[permalink] [raw]
Subject: Re: [OOPS] nfsv4 in linux 2.6.13 (-ck7)

On October 14, 2005 11:49, Trond Myklebust wrote:
> Does the attached patch fix it for you?
>
> Cheers,
> Trond

This patch http://www.citi.umich.edu/projects/nfsv4/linux/kernel-patches/2.6.13-1/linux-2.6.13-001-NFS_ALL_MODIFIED.dif
I found already fixed my problem, I applied your patch on top and everything seems to be working fine.

--
Gabriel A. Devenyi
[email protected]