2007-09-21 00:22:34

by Chakri n

[permalink] [raw]
Subject: NFS on loopback locks up entire system(2.6.23-rc6)?

Hi,

I am testing NFS on loopback locks up entire system with 2.6.23-rc6 kernel.

I have mounted a local ext3 partition using loopback NFS (version 3)
and started my test program. The test program forks 20 threads
allocates 10MB for each thread, writes & reads a file on the loopback
NFS mount. After running for about 5 min, I cannot even login to the
machine. Commands like ps etc, hang in a live session.

The machine is a DELL 1950 with 4Gig of RAM, so there is plenty of RAM
& CPU to play around and no other io/heavy processes are running on
the system.

vmstat output shows no buffers are actually getting transferred in or
out and iowait is 100%.

[root@h46 ~]# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
r b swpd free buff cache si so bi bo
in cs us sy id wa st
0 24 116 110080 11132 3045664 0 0 0 0 28 345 0
1 0 99 0
0 24 116 110080 11132 3045664 0 0 0 0 5 329 0
0 0 100 0
0 24 116 110080 11132 3045664 0 0 0 0 26 336 0
0 0 100 0
0 24 116 110080 11132 3045664 0 0 0 0 8 335 0
0 0 100 0
0 24 116 110080 11132 3045664 0 0 0 0 26 352 0
0 0 100 0
0 24 116 110080 11132 3045664 0 0 0 0 8 351 0
0 0 100 0
0 24 116 110080 11132 3045664 0 0 0 0 23 358 0
1 0 99 0
0 24 116 110080 11132 3045664 0 0 0 0 10 350 0
0 0 100 0
0 24 116 110080 11132 3045664 0 0 0 0 26 363 0
0 0 100 0
0 24 116 110080 11132 3045664 0 0 0 0 8 346 0
1 0 99 0
0 24 116 110080 11132 3045664 0 0 0 0 26 360 0
0 0 100 0
0 24 116 110080 11140 3045656 0 0 8 0 11 345 0
0 0 100 0
0 24 116 110080 11140 3045664 0 0 0 0 27 355 0
0 2 97 0
0 24 116 110080 11140 3045664 0 0 0 0 9 330 0
0 0 100 0
0 24 116 110080 11140 3045664 0 0 0 0 26 358 0
0 0 100 0


The following is the backtrace of
1. one of the threads of my test program
2. nfsd daemon and
3. a generic command like pstree, after the machine hangs:
-------------------------------------------------------------
crash> bt 3252
PID: 3252 TASK: f6f3c610 CPU: 0 COMMAND: "test"
#0 [f6bdcc10] schedule at c0624a34
#1 [f6bdcc84] schedule_timeout at c06250ee
#2 [f6bdccc8] io_schedule_timeout at c0624c15
#3 [f6bdccdc] congestion_wait at c045eb7d
#4 [f6bdcd00] balance_dirty_pages_ratelimited_nr at c045ab91
#5 [f6bdcd54] generic_file_buffered_write at c0457148
#6 [f6bdcde8] __generic_file_aio_write_nolock at c04576e5
#7 [f6bdce40] try_to_wake_up at c042342b
#8 [f6bdce5c] generic_file_aio_write at c0457799
#9 [f6bdce8c] nfs_file_write at f8c25cee
#10 [f6bdced0] do_sync_write at c0472e27
#11 [f6bdcf7c] vfs_write at c0473689
#12 [f6bdcf98] sys_write at c0473c95
#13 [f6bdcfb4] sysenter_entry at c0404ddf
EAX: 00000004 EBX: 00000013 ECX: a4966008 EDX: 00980000
DS: 007b ESI: 00980000 ES: 007b EDI: a4966008
SS: 007b ESP: a5ae6ec0 EBP: a5ae6ef0
CS: 0073 EIP: b7eed410 ERR: 00000004 EFLAGS: 00000246
crash> bt 3188
PID: 3188 TASK: f74c4000 CPU: 1 COMMAND: "nfsd"
#0 [f6836c7c] schedule at c0624a34
#1 [f6836cf0] __mutex_lock_slowpath at c062543d
#2 [f6836d0c] mutex_lock at c0625326
#3 [f6836d18] generic_file_aio_write at c0457784
#4 [f6836d48] ext3_file_write at f8888fd7
#5 [f6836d64] do_sync_readv_writev at c0472d1f
#6 [f6836e08] do_readv_writev at c0473486
#7 [f6836e6c] vfs_writev at c047358e
#8 [f6836e7c] nfsd_vfs_write at f8e7f8d7
#9 [f6836ee0] nfsd_write at f8e80139
#10 [f6836f10] nfsd3_proc_write at f8e86afd
#11 [f6836f44] nfsd_dispatch at f8e7c20c
#12 [f6836f6c] svc_process at f89c18e0
#13 [f6836fbc] nfsd at f8e7c794
#14 [f6836fe4] kernel_thread_helper at c0405a35
crash> ps|grep ps
234 2 3 cb194000 IN 0.0 0 0 [khpsbpkt]
520 2 0 f7e18c20 IN 0.0 0 0 [kpsmoused]
2859 1 2 f7f3cc20 IN 0.1 9600 2040 cupsd
3340 3310 0 f4a0f840 UN 0.0 4360 816 pstree
3343 3284 2 f4a0f230 UN 0.0 4212 944 ps
crash> bt 3340
PID: 3340 TASK: f4a0f840 CPU: 0 COMMAND: "pstree"
#0 [e856be30] schedule at c0624a34
#1 [e856bea4] rwsem_down_failed_common at c04df6c0
#2 [e856bec4] rwsem_down_read_failed at c0625c2a
#3 [e856bedc] call_rwsem_down_read_failed at c0625c96
#4 [e856bee8] down_read at c043c21a
#5 [e856bef0] access_process_vm at c0462039
#6 [e856bf38] proc_pid_cmdline at c04a1bbb
#7 [e856bf58] proc_info_read at c04a2f41
#8 [e856bf7c] vfs_read at c04737db
#9 [e856bf98] sys_read at c0473c2e
#10 [e856bfb4] sysenter_entry at c0404ddf
EAX: 00000003 EBX: 00000005 ECX: 0804dc58 EDX: 00000062
DS: 007b ESI: 00000cba ES: 007b EDI: 0804e0e0
SS: 007b ESP: bfa3afe8 EBP: bfa3d4f8
CS: 0073 EIP: b7f64410 ERR: 00000003 EFLAGS: 00000246
----------------------------------------------------------

Any ideas what could potentially trigger this?

Please let me know if you would like to get any other specific details.

Thanks
--Chakri


2007-09-21 00:43:55

by Myklebust, Trond

[permalink] [raw]
Subject: Re: NFS on loopback locks up entire system(2.6.23-rc6)?

On Thu, 2007-09-20 at 17:22 -0700, Chakri n wrote:
> Hi,
>
> I am testing NFS on loopback locks up entire system with 2.6.23-rc6 kernel.
>
> I have mounted a local ext3 partition using loopback NFS (version 3)
> and started my test program. The test program forks 20 threads
> allocates 10MB for each thread, writes & reads a file on the loopback
> NFS mount. After running for about 5 min, I cannot even login to the
> machine. Commands like ps etc, hang in a live session.
>
> The machine is a DELL 1950 with 4Gig of RAM, so there is plenty of RAM
> & CPU to play around and no other io/heavy processes are running on
> the system.
>
> vmstat output shows no buffers are actually getting transferred in or
> out and iowait is 100%.
>
> [root@h46 ~]# vmstat 1
> procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu------
> r b swpd free buff cache si so bi bo
> in cs us sy id wa st
> 0 24 116 110080 11132 3045664 0 0 0 0 28 345 0
> 1 0 99 0
> 0 24 116 110080 11132 3045664 0 0 0 0 5 329 0
> 0 0 100 0
> 0 24 116 110080 11132 3045664 0 0 0 0 26 336 0
> 0 0 100 0
> 0 24 116 110080 11132 3045664 0 0 0 0 8 335 0
> 0 0 100 0
> 0 24 116 110080 11132 3045664 0 0 0 0 26 352 0
> 0 0 100 0
> 0 24 116 110080 11132 3045664 0 0 0 0 8 351 0
> 0 0 100 0
> 0 24 116 110080 11132 3045664 0 0 0 0 23 358 0
> 1 0 99 0
> 0 24 116 110080 11132 3045664 0 0 0 0 10 350 0
> 0 0 100 0
> 0 24 116 110080 11132 3045664 0 0 0 0 26 363 0
> 0 0 100 0
> 0 24 116 110080 11132 3045664 0 0 0 0 8 346 0
> 1 0 99 0
> 0 24 116 110080 11132 3045664 0 0 0 0 26 360 0
> 0 0 100 0
> 0 24 116 110080 11140 3045656 0 0 8 0 11 345 0
> 0 0 100 0
> 0 24 116 110080 11140 3045664 0 0 0 0 27 355 0
> 0 2 97 0
> 0 24 116 110080 11140 3045664 0 0 0 0 9 330 0
> 0 0 100 0
> 0 24 116 110080 11140 3045664 0 0 0 0 26 358 0
> 0 0 100 0
>
>
> The following is the backtrace of
> 1. one of the threads of my test program
> 2. nfsd daemon and
> 3. a generic command like pstree, after the machine hangs:
> -------------------------------------------------------------
> crash> bt 3252
> PID: 3252 TASK: f6f3c610 CPU: 0 COMMAND: "test"
> #0 [f6bdcc10] schedule at c0624a34
> #1 [f6bdcc84] schedule_timeout at c06250ee
> #2 [f6bdccc8] io_schedule_timeout at c0624c15
> #3 [f6bdccdc] congestion_wait at c045eb7d
> #4 [f6bdcd00] balance_dirty_pages_ratelimited_nr at c045ab91
> #5 [f6bdcd54] generic_file_buffered_write at c0457148
> #6 [f6bdcde8] __generic_file_aio_write_nolock at c04576e5
> #7 [f6bdce40] try_to_wake_up at c042342b
> #8 [f6bdce5c] generic_file_aio_write at c0457799
> #9 [f6bdce8c] nfs_file_write at f8c25cee
> #10 [f6bdced0] do_sync_write at c0472e27
> #11 [f6bdcf7c] vfs_write at c0473689
> #12 [f6bdcf98] sys_write at c0473c95
> #13 [f6bdcfb4] sysenter_entry at c0404ddf
> EAX: 00000004 EBX: 00000013 ECX: a4966008 EDX: 00980000
> DS: 007b ESI: 00980000 ES: 007b EDI: a4966008
> SS: 007b ESP: a5ae6ec0 EBP: a5ae6ef0
> CS: 0073 EIP: b7eed410 ERR: 00000004 EFLAGS: 00000246
> crash> bt 3188
> PID: 3188 TASK: f74c4000 CPU: 1 COMMAND: "nfsd"
> #0 [f6836c7c] schedule at c0624a34
> #1 [f6836cf0] __mutex_lock_slowpath at c062543d
> #2 [f6836d0c] mutex_lock at c0625326
> #3 [f6836d18] generic_file_aio_write at c0457784
> #4 [f6836d48] ext3_file_write at f8888fd7
> #5 [f6836d64] do_sync_readv_writev at c0472d1f
> #6 [f6836e08] do_readv_writev at c0473486
> #7 [f6836e6c] vfs_writev at c047358e
> #8 [f6836e7c] nfsd_vfs_write at f8e7f8d7
> #9 [f6836ee0] nfsd_write at f8e80139
> #10 [f6836f10] nfsd3_proc_write at f8e86afd
> #11 [f6836f44] nfsd_dispatch at f8e7c20c
> #12 [f6836f6c] svc_process at f89c18e0
> #13 [f6836fbc] nfsd at f8e7c794
> #14 [f6836fe4] kernel_thread_helper at c0405a35
> crash> ps|grep ps
> 234 2 3 cb194000 IN 0.0 0 0 [khpsbpkt]
> 520 2 0 f7e18c20 IN 0.0 0 0 [kpsmoused]
> 2859 1 2 f7f3cc20 IN 0.1 9600 2040 cupsd
> 3340 3310 0 f4a0f840 UN 0.0 4360 816 pstree
> 3343 3284 2 f4a0f230 UN 0.0 4212 944 ps
> crash> bt 3340
> PID: 3340 TASK: f4a0f840 CPU: 0 COMMAND: "pstree"
> #0 [e856be30] schedule at c0624a34
> #1 [e856bea4] rwsem_down_failed_common at c04df6c0
> #2 [e856bec4] rwsem_down_read_failed at c0625c2a
> #3 [e856bedc] call_rwsem_down_read_failed at c0625c96
> #4 [e856bee8] down_read at c043c21a
> #5 [e856bef0] access_process_vm at c0462039
> #6 [e856bf38] proc_pid_cmdline at c04a1bbb
> #7 [e856bf58] proc_info_read at c04a2f41
> #8 [e856bf7c] vfs_read at c04737db
> #9 [e856bf98] sys_read at c0473c2e
> #10 [e856bfb4] sysenter_entry at c0404ddf
> EAX: 00000003 EBX: 00000005 ECX: 0804dc58 EDX: 00000062
> DS: 007b ESI: 00000cba ES: 007b EDI: 0804e0e0
> SS: 007b ESP: bfa3afe8 EBP: bfa3d4f8
> CS: 0073 EIP: b7f64410 ERR: 00000003 EFLAGS: 00000246
> ----------------------------------------------------------
>
> Any ideas what could potentially trigger this?

This is pretty much expected: the NFS server is trying to allocate
memory. The VM then tries to satisfy that demand by freeing up resources
from the NFS client by telling the client to write out cached pages. The
client again is in a congested state in which it is waiting on the NFS
server to finish writing out what it already sent.
Quod Erat Deadlocked...

This is why 'mount --bind' is a good idea, and 'mount -t nfs localhost:'
is generally a bad one...

Cheers
Trond

2007-09-21 03:12:48

by Chakri n

[permalink] [raw]
Subject: Re: NFS on loopback locks up entire system(2.6.23-rc6)?

Thanks Trond, for clarifying this for me.

I have seen similar behavior when a remote NFS server is not
available. Many processes wait end up waiting in nfs_release_page. So,
what will happen if the remote server is not available,
nfs_release_page cannot free the memory since it waits on rpc request
to complete, which never completes and processes wait in there for
ever?

And unfortunately in my case, I cannot use "mount --bind". I want to
use the same file system from two different nodes, and I want file &
record locking to be consistent. The only way to make sure locking is
consistent is to use loopback NFS on 1 host and NFS mount the same
file system on other nodes, so that NFS server ensures file & record
locking to be consistent. Is there any alternative to this?

Is it possible or any efforts to integrate ext3 or other local file
systems locking & network file system locking, so that user can use
"mount --bind" on local host and NFS mount on remote nodes, but file &
record locking will be consistent between both the nodes?

Thanks
--Chakri

On 9/20/07, Trond Myklebust <[email protected]> wrote:
> On Thu, 2007-09-20 at 17:22 -0700, Chakri n wrote:
> > Hi,
> >
> > I am testing NFS on loopback locks up entire system with 2.6.23-rc6 kernel.
> >
> > I have mounted a local ext3 partition using loopback NFS (version 3)
> > and started my test program. The test program forks 20 threads
> > allocates 10MB for each thread, writes & reads a file on the loopback
> > NFS mount. After running for about 5 min, I cannot even login to the
> > machine. Commands like ps etc, hang in a live session.
> >
> > The machine is a DELL 1950 with 4Gig of RAM, so there is plenty of RAM
> > & CPU to play around and no other io/heavy processes are running on
> > the system.
> >
> > vmstat output shows no buffers are actually getting transferred in or
> > out and iowait is 100%.
> >
> > [root@h46 ~]# vmstat 1
> > procs -----------memory---------- ---swap-- -----io---- --system--
> > -----cpu------
> > r b swpd free buff cache si so bi bo
> > in cs us sy id wa st
> > 0 24 116 110080 11132 3045664 0 0 0 0 28 345 0
> > 1 0 99 0
> > 0 24 116 110080 11132 3045664 0 0 0 0 5 329 0
> > 0 0 100 0
> > 0 24 116 110080 11132 3045664 0 0 0 0 26 336 0
> > 0 0 100 0
> > 0 24 116 110080 11132 3045664 0 0 0 0 8 335 0
> > 0 0 100 0
> > 0 24 116 110080 11132 3045664 0 0 0 0 26 352 0
> > 0 0 100 0
> > 0 24 116 110080 11132 3045664 0 0 0 0 8 351 0
> > 0 0 100 0
> > 0 24 116 110080 11132 3045664 0 0 0 0 23 358 0
> > 1 0 99 0
> > 0 24 116 110080 11132 3045664 0 0 0 0 10 350 0
> > 0 0 100 0
> > 0 24 116 110080 11132 3045664 0 0 0 0 26 363 0
> > 0 0 100 0
> > 0 24 116 110080 11132 3045664 0 0 0 0 8 346 0
> > 1 0 99 0
> > 0 24 116 110080 11132 3045664 0 0 0 0 26 360 0
> > 0 0 100 0
> > 0 24 116 110080 11140 3045656 0 0 8 0 11 345 0
> > 0 0 100 0
> > 0 24 116 110080 11140 3045664 0 0 0 0 27 355 0
> > 0 2 97 0
> > 0 24 116 110080 11140 3045664 0 0 0 0 9 330 0
> > 0 0 100 0
> > 0 24 116 110080 11140 3045664 0 0 0 0 26 358 0
> > 0 0 100 0
> >
> >
> > The following is the backtrace of
> > 1. one of the threads of my test program
> > 2. nfsd daemon and
> > 3. a generic command like pstree, after the machine hangs:
> > -------------------------------------------------------------
> > crash> bt 3252
> > PID: 3252 TASK: f6f3c610 CPU: 0 COMMAND: "test"
> > #0 [f6bdcc10] schedule at c0624a34
> > #1 [f6bdcc84] schedule_timeout at c06250ee
> > #2 [f6bdccc8] io_schedule_timeout at c0624c15
> > #3 [f6bdccdc] congestion_wait at c045eb7d
> > #4 [f6bdcd00] balance_dirty_pages_ratelimited_nr at c045ab91
> > #5 [f6bdcd54] generic_file_buffered_write at c0457148
> > #6 [f6bdcde8] __generic_file_aio_write_nolock at c04576e5
> > #7 [f6bdce40] try_to_wake_up at c042342b
> > #8 [f6bdce5c] generic_file_aio_write at c0457799
> > #9 [f6bdce8c] nfs_file_write at f8c25cee
> > #10 [f6bdced0] do_sync_write at c0472e27
> > #11 [f6bdcf7c] vfs_write at c0473689
> > #12 [f6bdcf98] sys_write at c0473c95
> > #13 [f6bdcfb4] sysenter_entry at c0404ddf
> > EAX: 00000004 EBX: 00000013 ECX: a4966008 EDX: 00980000
> > DS: 007b ESI: 00980000 ES: 007b EDI: a4966008
> > SS: 007b ESP: a5ae6ec0 EBP: a5ae6ef0
> > CS: 0073 EIP: b7eed410 ERR: 00000004 EFLAGS: 00000246
> > crash> bt 3188
> > PID: 3188 TASK: f74c4000 CPU: 1 COMMAND: "nfsd"
> > #0 [f6836c7c] schedule at c0624a34
> > #1 [f6836cf0] __mutex_lock_slowpath at c062543d
> > #2 [f6836d0c] mutex_lock at c0625326
> > #3 [f6836d18] generic_file_aio_write at c0457784
> > #4 [f6836d48] ext3_file_write at f8888fd7
> > #5 [f6836d64] do_sync_readv_writev at c0472d1f
> > #6 [f6836e08] do_readv_writev at c0473486
> > #7 [f6836e6c] vfs_writev at c047358e
> > #8 [f6836e7c] nfsd_vfs_write at f8e7f8d7
> > #9 [f6836ee0] nfsd_write at f8e80139
> > #10 [f6836f10] nfsd3_proc_write at f8e86afd
> > #11 [f6836f44] nfsd_dispatch at f8e7c20c
> > #12 [f6836f6c] svc_process at f89c18e0
> > #13 [f6836fbc] nfsd at f8e7c794
> > #14 [f6836fe4] kernel_thread_helper at c0405a35
> > crash> ps|grep ps
> > 234 2 3 cb194000 IN 0.0 0 0 [khpsbpkt]
> > 520 2 0 f7e18c20 IN 0.0 0 0 [kpsmoused]
> > 2859 1 2 f7f3cc20 IN 0.1 9600 2040 cupsd
> > 3340 3310 0 f4a0f840 UN 0.0 4360 816 pstree
> > 3343 3284 2 f4a0f230 UN 0.0 4212 944 ps
> > crash> bt 3340
> > PID: 3340 TASK: f4a0f840 CPU: 0 COMMAND: "pstree"
> > #0 [e856be30] schedule at c0624a34
> > #1 [e856bea4] rwsem_down_failed_common at c04df6c0
> > #2 [e856bec4] rwsem_down_read_failed at c0625c2a
> > #3 [e856bedc] call_rwsem_down_read_failed at c0625c96
> > #4 [e856bee8] down_read at c043c21a
> > #5 [e856bef0] access_process_vm at c0462039
> > #6 [e856bf38] proc_pid_cmdline at c04a1bbb
> > #7 [e856bf58] proc_info_read at c04a2f41
> > #8 [e856bf7c] vfs_read at c04737db
> > #9 [e856bf98] sys_read at c0473c2e
> > #10 [e856bfb4] sysenter_entry at c0404ddf
> > EAX: 00000003 EBX: 00000005 ECX: 0804dc58 EDX: 00000062
> > DS: 007b ESI: 00000cba ES: 007b EDI: 0804e0e0
> > SS: 007b ESP: bfa3afe8 EBP: bfa3d4f8
> > CS: 0073 EIP: b7f64410 ERR: 00000003 EFLAGS: 00000246
> > ----------------------------------------------------------
> >
> > Any ideas what could potentially trigger this?
>
> This is pretty much expected: the NFS server is trying to allocate
> memory. The VM then tries to satisfy that demand by freeing up resources
> from the NFS client by telling the client to write out cached pages. The
> client again is in a congested state in which it is waiting on the NFS
> server to finish writing out what it already sent.
> Quod Erat Deadlocked...
>
> This is why 'mount --bind' is a good idea, and 'mount -t nfs localhost:'
> is generally a bad one...
>
> Cheers
> Trond
>

2007-09-21 12:47:42

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS on loopback locks up entire system(2.6.23-rc6)?

On Thu, Sep 20, 2007 at 08:12:34PM -0700, Chakri n wrote:
> And unfortunately in my case, I cannot use "mount --bind". I want to
> use the same file system from two different nodes, and I want file &
> record locking to be consistent. The only way to make sure locking is
> consistent is to use loopback NFS on 1 host and NFS mount the same
> file system on other nodes, so that NFS server ensures file & record
> locking to be consistent. Is there any alternative to this?

The NFS server acquires a lock on the exported filesystem before
granting it to an NFS client, so loopback mounting shouldn't add any
extra safety. If you've witnessed some exception to this rule, please
let us know.

> Is it possible or any efforts to integrate ext3 or other local file
> systems locking & network file system locking, so that user can use
> "mount --bind" on local host and NFS mount on remote nodes, but file &
> record locking will be consistent between both the nodes?

That's the way it works already, more or less.

(It depends a little on what you mean by "file and record locking". The
byte-range locks acquired by fcntl() have always been enforced over NFS.
Locks acquired by flock() haven't always been, but loopback mounting
doesn't solve this problem.)

--b.

2007-09-21 14:13:56

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [NFS] NFS on loopback locks up entire system(2.6.23-rc6)?

On Thu, 2007-09-20 at 20:12 -0700, Chakri n wrote:
> Thanks Trond, for clarifying this for me.
>
> I have seen similar behavior when a remote NFS server is not
> available. Many processes wait end up waiting in nfs_release_page. So,
> what will happen if the remote server is not available,
> nfs_release_page cannot free the memory since it waits on rpc request
> to complete, which never completes and processes wait in there for
> ever?
>
> And unfortunately in my case, I cannot use "mount --bind". I want to
> use the same file system from two different nodes, and I want file &
> record locking to be consistent. The only way to make sure locking is
> consistent is to use loopback NFS on 1 host and NFS mount the same
> file system on other nodes, so that NFS server ensures file & record
> locking to be consistent. Is there any alternative to this?
>
> Is it possible or any efforts to integrate ext3 or other local file
> systems locking & network file system locking, so that user can use
> "mount --bind" on local host and NFS mount on remote nodes, but file &
> record locking will be consistent between both the nodes?

Could you be a bit more specific? Is the problem that your application
is using BSD locks (flock()) instead of POSIX locks?

Cheers
Trond

2007-09-21 16:20:22

by Chakri n

[permalink] [raw]
Subject: Re: [NFS] NFS on loopback locks up entire system(2.6.23-rc6)?

Thanks.

I was using flock (BSD locking) and I think the problem should be
solved if I move my application to use POSIX locks.

And any option to avoid processes waiting indefinitely to free pages
from NFS requests waiting on unresponsive NFS server?

Thanks
--Chakri

On 9/21/07, Trond Myklebust <[email protected]> wrote:
> On Thu, 2007-09-20 at 20:12 -0700, Chakri n wrote:
> > Thanks Trond, for clarifying this for me.
> >
> > I have seen similar behavior when a remote NFS server is not
> > available. Many processes wait end up waiting in nfs_release_page. So,
> > what will happen if the remote server is not available,
> > nfs_release_page cannot free the memory since it waits on rpc request
> > to complete, which never completes and processes wait in there for
> > ever?
> >
> > And unfortunately in my case, I cannot use "mount --bind". I want to
> > use the same file system from two different nodes, and I want file &
> > record locking to be consistent. The only way to make sure locking is
> > consistent is to use loopback NFS on 1 host and NFS mount the same
> > file system on other nodes, so that NFS server ensures file & record
> > locking to be consistent. Is there any alternative to this?
> >
> > Is it possible or any efforts to integrate ext3 or other local file
> > systems locking & network file system locking, so that user can use
> > "mount --bind" on local host and NFS mount on remote nodes, but file &
> > record locking will be consistent between both the nodes?
>
> Could you be a bit more specific? Is the problem that your application
> is using BSD locks (flock()) instead of POSIX locks?
>
> Cheers
> Trond
>

2007-09-21 16:24:51

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [NFS] NFS on loopback locks up entire system(2.6.23-rc6)?

On Fri, 2007-09-21 at 09:20 -0700, Chakri n wrote:
> Thanks.
>
> I was using flock (BSD locking) and I think the problem should be
> solved if I move my application to use POSIX locks.

Yup.

> And any option to avoid processes waiting indefinitely to free pages
> from NFS requests waiting on unresponsive NFS server?

The only solution I know of is to use soft mounts, but that brings
another set of problems:
1. most applications don't know how to recover safely from an EIO
error.
2. You lose data.

Cheers
Trond

2007-09-21 18:06:24

by Chakri n

[permalink] [raw]
Subject: Re: [NFS] NFS on loopback locks up entire system(2.6.23-rc6)?

Isn't this a strict requirement from client side, asking to guarantee
that a server stays up all the time?

I have seen many cases, where people go and directly change IP of
their NFS filers or servers worrying least about the clients using
them.

Can we get around with some sort of congestion logic?

Thanks
--Chakri

On 9/21/07, Trond Myklebust <[email protected]> wrote:
> On Fri, 2007-09-21 at 09:20 -0700, Chakri n wrote:
> > Thanks.
> >
> > I was using flock (BSD locking) and I think the problem should be
> > solved if I move my application to use POSIX locks.
>
> Yup.
>
> > And any option to avoid processes waiting indefinitely to free pages
> > from NFS requests waiting on unresponsive NFS server?
>
> The only solution I know of is to use soft mounts, but that brings
> another set of problems:
> 1. most applications don't know how to recover safely from an EIO
> error.
> 2. You lose data.
>
> Cheers
> Trond
>

2007-09-21 18:14:40

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [NFS] NFS on loopback locks up entire system(2.6.23-rc6)?

No. The requirement for 'hard' mounts is not that the server be up all
the time. The server can go up and down as it pleases: the client can
happily recover from that.

The requirement is rather that nobody remove it permanently before the
application is done with it, and the partition is unmounted. That is
hardly unreasonable (it is the only way I know of to ensure data
integrity), and it is much less strict than the requirements for local
disks.

Trond

On Fri, 2007-09-21 at 11:06 -0700, Chakri n wrote:
> Isn't this a strict requirement from client side, asking to guarantee
> that a server stays up all the time?
>
> I have seen many cases, where people go and directly change IP of
> their NFS filers or servers worrying least about the clients using
> them.
>
> Can we get around with some sort of congestion logic?
>
> Thanks
> --Chakri
>
> On 9/21/07, Trond Myklebust <[email protected]> wrote:
> > On Fri, 2007-09-21 at 09:20 -0700, Chakri n wrote:
> > > Thanks.
> > >
> > > I was using flock (BSD locking) and I think the problem should be
> > > solved if I move my application to use POSIX locks.
> >
> > Yup.
> >
> > > And any option to avoid processes waiting indefinitely to free pages
> > > from NFS requests waiting on unresponsive NFS server?
> >
> > The only solution I know of is to use soft mounts, but that brings
> > another set of problems:
> > 1. most applications don't know how to recover safely from an EIO
> > error.
> > 2. You lose data.
> >
> > Cheers
> > Trond
> >

2007-09-22 06:28:57

by Chakri n

[permalink] [raw]
Subject: Re: [NFS] NFS on loopback locks up entire system(2.6.23-rc6)?

On 9/21/07, Trond Myklebust <[email protected]> wrote:
> No. The requirement for 'hard' mounts is not that the server be up all
> the time. The server can go up and down as it pleases: the client can
> happily recover from that.
>
> The requirement is rather that nobody remove it permanently before the
> application is done with it, and the partition is unmounted. That is
> hardly unreasonable (it is the only way I know of to ensure data
> integrity), and it is much less strict than the requirements for local
> disks.

Yes. I completely agree. This is required for data consistency.

But in my testing, if one of the NFS server/mount goes offline for
some point of time, the entire system slows down, especially IO.

In my test program, I forked off 50 threads to do 4K writes on 50
different files in a NFS mounted directory.

Now, I have turned off the NFS server and started another dd process
on local disk ("dd if=/dev/zero of=/tmp/x count=1000") and this dd
process progresses.

I see I/O wait of 100% in vmstat.
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 21 0 2628416 15152 551024 0 0 0 0 28 344 0
0 0 100 0
0 21 0 2628416 15152 551024 0 0 0 0 8 340 0
0 0 100 0
0 21 0 2628416 15152 551024 0 0 0 0 26 343 0
0 0 100 0
0 21 0 2628416 15152 551024 0 0 0 0 8 341 0
0 0 100 0
0 21 0 2628416 15152 551024 0 0 0 0 26 357 0
0 0 100 0
0 21 0 2628416 15152 551024 0 0 0 0 8 325 0
0 0 100 0
0 21 0 2628416 15152 551024 0 0 0 0 26 343 0
0 0 100 0
0 21 0 2628416 15152 551024 0 0 0 0 8 325 0
0 0 100 0

I have about 4Gig of RAM in the system and most of the memory is free.
I see only about 550MB in buffers, rest all is pretty much available.

[root@h46 ~]# free
total used free shared buffers cached
Mem: 3238004 609340 2628664 0 15136 551024
-/+ buffers/cache: 43180 3194824
Swap: 4096532 0 4096532

Here is the stack trace for one of my test program threads and dd
process, both of them are stuck in congestion_wait.
--------------------------------------
PID: 3552 TASK: cb1fc610 CPU: 0 COMMAND: "dd"
#0 [f5c04c38] schedule at c0624a34
#1 [f5c04cac] schedule_timeout at c06250ee
#2 [f5c04cf0] io_schedule_timeout at c0624c15
#3 [f5c04d04] congestion_wait at c045eb7d
#4 [f5c04d28] balance_dirty_pages_ratelimited_nr at c045ab91
#5 [f5c04d7c] generic_file_buffered_write at c0457148
#6 [f5c04e10] __generic_file_aio_write_nolock at c04576e5
#7 [f5c04e84] generic_file_aio_write at c0457799
#8 [f5c04eb4] ext3_file_write at f8888fd7
#9 [f5c04ed0] do_sync_write at c0472e27
#10 [f5c04f7c] vfs_write at c0473689
#11 [f5c04f98] sys_write at c0473c95
#12 [f5c04fb4] sysenter_entry at c0404ddf
------------------------------------------
#0 [f6050c10] schedule at c0624a34
#1 [f6050c84] schedule_timeout at c06250ee
#2 [f6050cc8] io_schedule_timeout at c0624c15
#3 [f6050cdc] congestion_wait at c045eb7d
#4 [f6050d00] balance_dirty_pages_ratelimited_nr at c045ab91
#5 [f6050d54] generic_file_buffered_write at c0457148
#6 [f6050de8] __generic_file_aio_write_nolock at c04576e5
#7 [f6050e40] enqueue_entity at c042131f
#8 [f6050e5c] generic_file_aio_write at c0457799
#9 [f6050e8c] nfs_file_write at f8f90cee
#10 [f6050e9c] getnstimeofday at c043d3f7
#11 [f6050ed0] do_sync_write at c0472e27
#12 [f6050f7c] vfs_write at c0473689
#13 [f6050f98] sys_write at c0473c95
#14 [f6050fb4] sysenter_entry at c0404ddf
-----------------------------------

Can this be worked around, since most of the RAM is available, dd
process could infact find more memory for it's buffers rather than
waiting due to NFS requests. I believe this could be one reason why
file systems like VxFS use their own buffer cache different from
system-wide buffer cache.

Thanks
--Chakri

2007-10-06 07:35:55

by Pavel Machek

[permalink] [raw]
Subject: Re: NFS on loopback locks up entire system(2.6.23-rc6)?

Hi!

> > I am testing NFS on loopback locks up entire system with 2.6.23-rc6 kernel.
> >
> > I have mounted a local ext3 partition using loopback NFS (version 3)
> > and started my test program. The test program forks 20 threads
> > allocates 10MB for each thread, writes & reads a file on the loopback
> > NFS mount. After running for about 5 min, I cannot even login to the
> > machine. Commands like ps etc, hang in a live session.
> >
> > The machine is a DELL 1950 with 4Gig of RAM, so there is plenty of RAM
> > & CPU to play around and no other io/heavy processes are running on
> > the system.
...
> > Any ideas what could potentially trigger this?
>
> This is pretty much expected: the NFS server is trying to allocate
> memory. The VM then tries to satisfy that demand by freeing up resources
> from the NFS client by telling the client to write out cached pages. The
> client again is in a congested state in which it is waiting on the NFS
> server to finish writing out what it already sent.
> Quod Erat Deadlocked...

Could we add nice warning to documentation? Or make kernel
printk(ALERT) when user tries to mount 127.0.0.1 (ok, that may not be
completely foolproof). Or make server refuse connections from
localhost and syslog loudly?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html