2005-09-25 13:16:54

by Malte Schröder

[permalink] [raw]
Subject: Problem with nfs4, kernel 2.6.13.2

Hello list.
When doing lots of reads over nfs4 (i.e. a few gigabytes), I can reproduce the
following error (using tar -c <my-nfs-path> | dd of=/dev/null):

#########################
Unable to handle kernel paging request at 0000000000100108 RIP:
<ffffffff80198248>{generic_drop_inode+56}
PGD 141bd067 PUD 141c2067 PMD 0
Oops: 0002 [1] PREEMPT
CPU 0
Modules linked in: nfs lockd nfs_acl sunrpc thermal fan button ac battery
autofs4 af_packet joydev parport_pc parport floppy tuner tvaudio bttv
video_buf fir
mware_class i2c_algo_bit v4l2_common btcx_risc tveeprom videodev skge usbhid
ehci_hcd uhci_hcd usbcore snd_emu10k1_synth snd_emux_synth snd_seq_virmidi
snd_s
eq_midi_emul snd_emu10k1 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm
snd_page_alloc snd_util_mem snd_hwdep eth1394 via82cxxx sata_via ohci1394
ieee1394
nls_utf8 ntfs nls_base ext3 jbd mbcache snd_seq_dummy snd_seq_oss snd_seq_midi
snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundcore
ipv6 w83627hf_wdt w83781d i2c_isa i2c_viapro i2c_dev i2c_sensor powernow_k8
freq_table processor cpufreq_ondemand ide_floppy ide_cd cdrom ide_core unix
reise
rfs sd_mod sata_promise libata scsi_mod evdev psmouse
Pid: 129, comm: kswapd0 Not tainted 2.6.13.2
RIP: 0010:[<ffffffff80198248>] <ffffffff80198248>{generic_drop_inode+56}
RSP: 0018:ffff81003f463bd8 EFLAGS: 00010246
RAX: 0000000000200200 RBX: ffff81001c3ae5c0 RCX: 0000000000100100
RDX: ffff81001c3ae5d0 RSI: ffff810021a7e400 RDI: ffff81001c3ae5c0
RBP: ffff81001c3ae5c0 R08: 00000000fffffffa R09: 0000000000000000
R10: 0000000000000001 R11: ffffffff80198210 R12: ffff81000a48d400
R13: ffff81001c3ae430 R14: ffff81000a48d800 R15: ffff81003f463cc8
FS: 00002aaaab01f6d0(0000) GS:ffffffff803ee800(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000100108 CR3: 0000000011ff1000 CR4: 00000000000006e0
Process kswapd0 (pid: 129, threadinfo ffff81003f462000, task ffff81003f45ee50)
Stack: 0000000000000000 ffffffff8840f256 000000000000065d ffff81001c3ae4c0
ffff81001c3ae6d8 ffffffff8015940f ffff81003f463c38 ffff81001c3ae5c0
ffff81001c3ae5a8 ffffffff8016334a
Call Trace:<ffffffff8840f256>{:nfs:__nfs_revalidate_inode+278}
<ffffffff8015940f>{find_get_pages_tag+47}
<ffffffff8016334a>{pagevec_lookup_tag+26}
<ffffffff80159a28>{wait_on_page_writeback_range+200}
<ffffffff88415ec7>{:nfs:nfs_wait_on_requests+247}
<ffffffff8842acea>{:nfs:nfs_do_return_delegation+42}
<ffffffff8840ef00>{:nfs:nfs4_clear_inode+32}
<ffffffff80197aa3>{clear_inode+147}
<ffffffff80197b48>{dispose_list+104}
<ffffffff80197e69>{shrink_icache_memory+553}
<ffffffff80195694>{shrink_dcache_memory+4}
<ffffffff8016467c>{shrink_slab+220}
<ffffffff80165b99>{balance_pgdat+617} <ffffffff80165e07>{kswapd+279}
<ffffffff8014b880>{autoremove_wake_function+0}
<ffffffff8010f692>{child_rip+8}
<ffffffff80165cf0>{kswapd+0} <ffffffff8010f68a>{child_rip+0}


Code: 48 89 41 08 48 89 08 48 8b 05 ba f2 17 00 48 89 50 08 48 89
RIP <ffffffff80198248>{generic_drop_inode+56} RSP <ffff81003f463bd8>
CR2: 0000000000100108
<6>note: kswapd0[129] exited with preempt_count 1
#########################

This also happens on two other machines running i386-kernels.
I hope the above data helps, if you need other information, I will provide it.

Greets
--
---------------------------------------
Malte Schr?der
[email protected]
ICQ# 68121508
---------------------------------------


Attachments:
(No filename) (0.00 B)
(No filename) (189.00 B)
Download all attachments

2005-09-26 12:28:54

by Myklebust, Trond

[permalink] [raw]
Subject: Re: Problem with nfs4, kernel 2.6.13.2

su den 25.09.2005 Klokka 15:16 (+0200) skreiv Malte Schröder:
> Hello list.
> When doing lots of reads over nfs4 (i.e. a few gigabytes), I can reproduce the
> following error (using tar -c <my-nfs-path> | dd of=/dev/null):

Hi,

Could you give us some details about your setup (client _and_ server)
please?

Also, is this something that is NFSv4 only, or can you reproduce it on
NFSv2/v3 too?

Cheers,
Trond

> #########################
> Unable to handle kernel paging request at 0000000000100108 RIP:
> <ffffffff80198248>{generic_drop_inode+56}
> PGD 141bd067 PUD 141c2067 PMD 0
> Oops: 0002 [1] PREEMPT
> CPU 0
> Modules linked in: nfs lockd nfs_acl sunrpc thermal fan button ac battery
> autofs4 af_packet joydev parport_pc parport floppy tuner tvaudio bttv
> video_buf fir
> mware_class i2c_algo_bit v4l2_common btcx_risc tveeprom videodev skge usbhid
> ehci_hcd uhci_hcd usbcore snd_emu10k1_synth snd_emux_synth snd_seq_virmidi
> snd_s
> eq_midi_emul snd_emu10k1 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm
> snd_page_alloc snd_util_mem snd_hwdep eth1394 via82cxxx sata_via ohci1394
> ieee1394
> nls_utf8 ntfs nls_base ext3 jbd mbcache snd_seq_dummy snd_seq_oss snd_seq_midi
> snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundcore
> ipv6 w83627hf_wdt w83781d i2c_isa i2c_viapro i2c_dev i2c_sensor powernow_k8
> freq_table processor cpufreq_ondemand ide_floppy ide_cd cdrom ide_core unix
> reise
> rfs sd_mod sata_promise libata scsi_mod evdev psmouse
> Pid: 129, comm: kswapd0 Not tainted 2.6.13.2
> RIP: 0010:[<ffffffff80198248>] <ffffffff80198248>{generic_drop_inode+56}
> RSP: 0018:ffff81003f463bd8 EFLAGS: 00010246
> RAX: 0000000000200200 RBX: ffff81001c3ae5c0 RCX: 0000000000100100
> RDX: ffff81001c3ae5d0 RSI: ffff810021a7e400 RDI: ffff81001c3ae5c0
> RBP: ffff81001c3ae5c0 R08: 00000000fffffffa R09: 0000000000000000
> R10: 0000000000000001 R11: ffffffff80198210 R12: ffff81000a48d400
> R13: ffff81001c3ae430 R14: ffff81000a48d800 R15: ffff81003f463cc8
> FS: 00002aaaab01f6d0(0000) GS:ffffffff803ee800(0000) knlGS:0000000000000000
> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000100108 CR3: 0000000011ff1000 CR4: 00000000000006e0
> Process kswapd0 (pid: 129, threadinfo ffff81003f462000, task ffff81003f45ee50)
> Stack: 0000000000000000 ffffffff8840f256 000000000000065d ffff81001c3ae4c0
> ffff81001c3ae6d8 ffffffff8015940f ffff81003f463c38 ffff81001c3ae5c0
> ffff81001c3ae5a8 ffffffff8016334a
> Call Trace:<ffffffff8840f256>{:nfs:__nfs_revalidate_inode+278}
> <ffffffff8015940f>{find_get_pages_tag+47}
> <ffffffff8016334a>{pagevec_lookup_tag+26}
> <ffffffff80159a28>{wait_on_page_writeback_range+200}
> <ffffffff88415ec7>{:nfs:nfs_wait_on_requests+247}
> <ffffffff8842acea>{:nfs:nfs_do_return_delegation+42}
> <ffffffff8840ef00>{:nfs:nfs4_clear_inode+32}
> <ffffffff80197aa3>{clear_inode+147}
> <ffffffff80197b48>{dispose_list+104}
> <ffffffff80197e69>{shrink_icache_memory+553}
> <ffffffff80195694>{shrink_dcache_memory+4}
> <ffffffff8016467c>{shrink_slab+220}
> <ffffffff80165b99>{balance_pgdat+617} <ffffffff80165e07>{kswapd+279}
> <ffffffff8014b880>{autoremove_wake_function+0}
> <ffffffff8010f692>{child_rip+8}
> <ffffffff80165cf0>{kswapd+0} <ffffffff8010f68a>{child_rip+0}
>
>
> Code: 48 89 41 08 48 89 08 48 8b 05 ba f2 17 00 48 89 50 08 48 89
> RIP <ffffffff80198248>{generic_drop_inode+56} RSP <ffff81003f463bd8>
> CR2: 0000000000100108
> <6>note: kswapd0[129] exited with preempt_count 1
> #########################
>
> This also happens on two other machines running i386-kernels.
> I hope the above data helps, if you need other information, I will provide it.
>
> Greets

2005-09-26 14:23:04

by Malte Schröder

[permalink] [raw]
Subject: Re: Problem with nfs4, kernel 2.6.13.2

On Monday 26 September 2005 14:28, Trond Myklebust wrote:
> Could you give us some details about your setup (client _and_ server)
> please?

Clients and server are running Debian/Sid (one client amd64, the rest i386)
with given kernel version. Nfs userspace is at version 1.0.7. The kernel was
made using Debian's "gcc version 4.0.2 20050917 (prerelease) (Debian
4.0.1-8))".
The server is using the kernelspace server.

> Also, is this something that is NFSv4 only, or can you reproduce it on
> NFSv2/v3 too?

I will try. But i will have to reconfigure client and server first ...


Attachments:
(No filename) (584.00 B)
(No filename) (189.00 B)
Download all attachments

2005-09-26 20:18:41

by Malte Schröder

[permalink] [raw]
Subject: Re: Problem with nfs4, kernel 2.6.13.2

On Monday 26 September 2005 14:28, Trond Myklebust wrote:
> Also, is this something that is NFSv4 only, or can you reproduce it on
> NFSv2/v3 too?

I have been running my "stress test" for a few hours with nfsv3, without
problems.
I tried over nfsv4 again and it crashed after a few minutes.

--
---------------------------------------
Malte Schröder
[email protected]
ICQ# 68121508
---------------------------------------


Attachments:
(No filename) (425.00 B)
(No filename) (189.00 B)
Download all attachments

2005-10-09 20:57:31

by Malte Schröder

[permalink] [raw]
Subject: Re: Problem with nfs4, kernel 2.6.13.2

Malte Schr?der wrote:
> On Monday 26 September 2005 14:28, Trond Myklebust wrote:
>
>>Also, is this something that is NFSv4 only, or can you reproduce it on
>>NFSv2/v3 too?
>
>
> I have been running my "stress test" for a few hours with nfsv3, without
> problems.
> I tried over nfsv4 again and it crashed after a few minutes.
>

I had time to try this with 2.6.12.5, I was not able to reproduce the
error there.
I also found the following post on LKML:

Bret Towe wrote:
> On 9/8/05, Bret Towe <[email protected]> wrote:
>
>>On 9/6/05, J. Bruce Fields <[email protected]> wrote:
>>
>>>On Mon, Sep 05, 2005 at 08:40:53PM -0700, Bret Towe wrote:
>>>
>>>>Pid: 14169, comm: xmms Tainted: G M 2.6.13
>>>
>>>Hm, can someone explain what that means? A proprietary module was
>>>loaded then unloaded, maybe?
>>>
>>>You may also want to retest with
>>>
>>>http://www.citi.umich.edu/projects/nfsv4/linux/kernel-patches/2.6.13-1/linux-2.6.13-001-NFS_ALL_MODIFIED.dif
>>>
>>>applied, to make sure there isn't a patch in Trond's series that already
>>>fixes the bug.
>>>
>>>--b.
>>>
>>
>>ive been running this since i got the url and so far i havent hit it
>>ive also been a bit busy so i havent been able to make sure its good
>>this weekend i should be able to test it and make sure its solved
>>
>
> ran it pretty hard over the weekend and i had no crashs at all
> so i think its safe to say this patch fixes the issues i was seeing

The above server is currently unreachable from my part of the net but
Bret Towe seemed to have the same problem as I have. Since the problem
also appears when using 2.6.14-rc3 I think the patch should be looked at
and maybe considered for inclusion. As soon as I gain access to that
patch I will test it and report my results.


Attachments:
signature.asc (256.00 B)
OpenPGP digital signature

2005-10-10 15:25:47

by Myklebust, Trond

[permalink] [raw]
Subject: Re: Problem with nfs4, kernel 2.6.13.2

su den 09.10.2005 Klokka 22:57 (+0200) skreiv Malte Schröder:
> >>>
> >>>You may also want to retest with
> >>>
> >>>http://www.citi.umich.edu/projects/nfsv4/linux/kernel-patches/2.6.13-1/linux-2.6.13-001-NFS_ALL_MODIFIED.dif
> >>>

> The above server is currently unreachable from my part of the net but
> Bret Towe seemed to have the same problem as I have. Since the problem
> also appears when using 2.6.14-rc3 I think the patch should be looked at
> and maybe considered for inclusion. As soon as I gain access to that
> patch I will test it and report my results.

I wrote all the elements in that patch so believe me, it has been
considered. 8-)

...however I still need to clean a few things up a bit before I'm ready
to send it on to Andrew and Linus. Do not expect it to appear in 2.6.14,
but rather in 2.6.15 (and possibly the 2.6.14-mm).

Cheers,
Trond

2005-10-10 19:33:28

by Malte Schröder

[permalink] [raw]
Subject: Re: Problem with nfs4, kernel 2.6.13.2

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Trond Myklebust wrote:
> su den 09.10.2005 Klokka 22:57 (+0200) skreiv Malte Schröder:
>
>>>>>You may also want to retest with
>>>>>
>>>>>http://www.citi.umich.edu/projects/nfsv4/linux/kernel-patches/2.6.13-1/linux-2.6.13-001-NFS_ALL_MODIFIED.dif
>>>>>
> I wrote all the elements in that patch so believe me, it has been
> considered. 8-)
>
> ...however I still need to clean a few things up a bit before I'm ready
> to send it on to Andrew and Linus. Do not expect it to appear in 2.6.14,
> but rather in 2.6.15 (and possibly the 2.6.14-mm).

I have been doing some testing of 2.6.13.2 with that patch applied on
two different machines and it seems to be working :)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (MingW32)
Comment: GnuPT 2.6.2.1 by EQUIPMENTE.DE
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDSsIB4q3E2oMjYtURAo3pAKC7+iJ8LHhV2DuOon4WxyxUtI4+UQCeNkgh
1OZUcL9q1mmSU8Xj/h2cJJc=
=s5O2
-----END PGP SIGNATURE-----