Return-path: Received: from magic.merlins.org ([209.81.13.136]:50030 "EHLO mail1.merlins.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753275Ab2GPRR3 (ORCPT ); Mon, 16 Jul 2012 13:17:29 -0400 Date: Mon, 16 Jul 2012 10:17:27 -0700 From: Marc MERLIN To: Eric Dumazet Cc: David Miller , Larry.Finger@lwfinger.net, bhutchings@solarflare.com, linux-wireless@vger.kernel.org, netdev@vger.kernel.org Subject: Re: 3.4.4/amd64 full interrupt hangs under big nfs copies Message-ID: <20120716171727.GA18469@merlins.org> (sfid-20120716_191736_839993_73070509) References: <1333998672.3007.245.camel@edumazet-glaptop> <20120409.153452.1284163346306246866.davem@davemloft.net> <1334030180.13293.98.camel@edumazet-glaptop> <20120410051127.GA32048@merlins.org> <1334038263.2907.1.camel@edumazet-glaptop> <20120411052733.GA17352@merlins.org> <20120715215935.GF24420@merlins.org> <1342419529.3265.12217.camel@edumazet-glaptop> <20120716151826.GA10586@merlins.org> <1342455717.2830.14.camel@edumazet-glaptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <1342455717.2830.14.camel@edumazet-glaptop> Sender: linux-wireless-owner@vger.kernel.org List-ID: On Mon, Jul 16, 2012 at 06:21:57PM +0200, Eric Dumazet wrote: > > No, it's atually when I'm 'uploading' from my laptop to my server. > > One interesting thing is that my server is running lvm2 with snapshots, > > which makes writes slower than my laptop can push data over the network, so > > it's definitely causing buffers to fill up. > > I just did a download test and got 4.5MB/s sustained without problems. > > Hmm, nfs apparently is able to push lot of data, try to reduce > rsize/wsize to sane values, like 32K instead of 512K ? > > gargamel:/mnt/dshelf2/ /net/gargamel/mnt/dshelf2 nfs4 > rw,nosuid,nodev,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.205.7,local_lock=none,addr=192.168.205.3 0 0 Nice catch. That seems like an excessive default from autofs5 5.0.4-3.2+b1 So, it helped. I still got hangs, but this time they were VFS hangs. I couldn't do anything filesystem related durign the 'hangs', but the interrupts weren't hung anymore, so I could move my mouse cursor. Having NFS hang all of VFS and local disk is obviously still a problem, but at this point it may not be a networking (or wireless) related problem. I'll attach the relevant logs during that attempt. Does that help? Thanks, Marc [76903.011101] SysRq : Show Blocked State [76903.011110] task PC stack pid father [76903.011306] mc D ffff88021e2d3680 0 9383 9270 0x00000080 [76903.011314] ffff880111094100 0000000000000082 000000000000000e ffff880213549140 [76903.011322] 0000000000013680 ffff8800140e3fd8 ffff8800140e3fd8 ffff880111094100 [76903.011328] ffff88021e5c5258 0000000000000000 ffff880111094100 ffff8800140e3e40 [76903.011335] Call Trace: [76903.011362] [] ? nfs_find_actor+0x66/0x66 [nfs] [76903.011376] [] ? nfs_wait_bit_killable+0x5b/0x6e [nfs] [76903.011384] [] ? __wait_on_bit_lock+0x3c/0x85 [76903.011391] [] ? filemap_fdatawait_range+0x11b/0x139 [76903.011397] [] ? out_of_line_wait_on_bit_lock+0x6f/0x78 [76903.011410] [] ? nfs_find_actor+0x66/0x66 [nfs] [76903.011417] [] ? autoremove_wake_function+0x2a/0x2a [76903.011435] [] ? nfs_commit_inode+0x66/0x27a [nfs] [76903.011448] [] ? nfs_file_fsync+0x95/0xf3 [nfs] [76903.011455] [] ? filp_close+0x3b/0x6a [76903.011461] [] ? sys_close+0x86/0xc7 [76903.011467] [] ? system_call_fastpath+0x1a/0x1f [76903.011482] kworker/0:0 D ffff88021e213680 0 13850 2 0x00000080 [76903.011489] ffff8801fac7d850 0000000000000046 ffff8802117cb848 ffff880140773750 [76903.011495] 0000000000013680 ffff88004c4e7fd8 ffff88004c4e7fd8 ffff8801fac7d850 [76903.011502] ffff88021e5df9a0 0000000000000000 ffff8801fac7d850 ffffffffa069be59 [76903.011508] Call Trace: [76903.011524] [] ? rpc_make_runnable+0x6a/0x6a [sunrpc] [76903.011535] [] ? rpc_wait_bit_killable+0x59/0x6c [sunrpc] [76903.011541] [] ? __wait_on_bit+0x3e/0x71 [76903.011547] [] ? _raw_spin_unlock_irqrestore+0x30/0x3e [76903.011553] [] ? out_of_line_wait_on_bit+0x6f/0x78 [76903.011565] [] ? rpc_make_runnable+0x6a/0x6a [sunrpc] [76903.011570] [] ? autoremove_wake_function+0x2a/0x2a [76903.011587] [] ? nfs_initiate_commit+0xf4/0x105 [nfs] [76903.011604] [] ? nfs_commit_inode+0x1f4/0x27a [nfs] [76903.011617] [] ? nfs_release_page+0x56/0x73 [nfs] [76903.011626] [] ? shrink_page_list+0x556/0x739 [76903.011635] [] ? get_parent_ip+0x9/0x1b [76903.011640] [] ? sub_preempt_count+0x83/0x94 [76903.011646] [] ? update_isolated_counts.isra.44+0x148/0x16e [76903.011653] [] ? shrink_inactive_list+0x2b1/0x446 [76903.011661] [] ? shrink_mem_cgroup_zone+0x371/0x480 [76903.011668] [] ? shrink_zone+0x62/0x9b [76903.011675] [] ? do_try_to_free_pages+0x1e4/0x434 [76903.011682] [] ? try_to_free_pages+0xb3/0xf9 [76903.011688] [] ? should_resched+0x5/0x23 [76903.011695] [] ? __alloc_pages_nodemask+0x4ef/0x7df [76903.011702] [] ? get_parent_ip+0x9/0x1b [76903.011711] [] ? alloc_pages_current+0xc7/0xe4 [76903.011723] [] ? iwlagn_rx_allocate+0x97/0x24d [iwlwifi] [76903.011734] [] ? iwlagn_rx_replenish+0x3a/0x3a [iwlwifi] [76903.011744] [] ? iwlagn_rx_replenish+0x18/0x3a [iwlwifi] [76903.011750] [] ? process_one_work+0x16d/0x298 [76903.011757] [] ? worker_thread+0xc2/0x145 [76903.011763] [] ? manage_workers.isra.23+0x15b/0x15b [76903.011768] [] ? kthread+0x7d/0x85 [76903.011774] [] ? kernel_thread_helper+0x4/0x10 [76903.011780] [] ? kthread_freezable_should_stop+0x37/0x37 [76903.011786] [] ? gs_change+0x13/0x13 [76903.011797] Sched Debug Version: v0.10, 3.4.4-amd64-preempt-noide-20120410 #1 and [76843.153742] [76873.080978] SysRq : Show Blocked State [76873.080987] task PC stack pid father [76873.081200] mc D ffff88021e293680 0 9383 9270 0x00000080 [76873.081208] ffff880111094100 0000000000000082 0000000000000001 ffff8802135107d0 [76873.081216] 0000000000013680 ffff8800140e3fd8 ffff8800140e3fd8 ffff880111094100 [76873.081222] ffff88010c9033d0 ffff88021e293680 ffff880111094100 ffffffff810bb429 [76873.081229] Call Trace: [76873.081241] [] ? __lock_page+0x66/0x66 [76873.081249] [] ? io_schedule+0x55/0x6b [76873.081254] [] ? sleep_on_page+0x6/0xa [76873.081260] [] ? __wait_on_bit+0x3e/0x71 [76873.081265] [] ? wait_on_page_bit+0x6e/0x73 [76873.081272] [] ? autoremove_wake_function+0x2a/0x2a [76873.081278] [] ? filemap_fdatawait_range+0x74/0x139 [76873.081285] [] ? filemap_write_and_wait_range+0x3b/0x4d [76873.081308] [] ? nfs_file_fsync+0x5d/0xf3 [nfs] [76873.081317] [] ? filp_close+0x3b/0x6a [76873.081323] [] ? sys_close+0x86/0xc7 [76873.081330] [] ? system_call_fastpath+0x1a/0x1f [76873.081346] kworker/0:0 D ffff88021e213680 0 13850 2 0x00000080 [76873.081352] ffff8801fac7d850 0000000000000046 ffff880186753ce8 ffff880126d7f040 [76873.081358] 0000000000013680 ffff88004c4e7fd8 ffff88004c4e7fd8 ffff8801fac7d850 [76873.081365] ffff8801c5ae1d70 ffff88021e213680 ffff8801fac7d850 ffffffff810bb429 [76873.081371] Call Trace: [76873.081376] [] ? __lock_page+0x66/0x66 [76873.081381] [] ? io_schedule+0x55/0x6b [76873.081386] [] ? sleep_on_page+0x6/0xa [76873.081391] [] ? __wait_on_bit+0x3e/0x71 [76873.081396] [] ? wait_on_page_bit+0x6e/0x73 [76873.081402] [] ? autoremove_wake_function+0x2a/0x2a [76873.081411] [] ? shrink_page_list+0x166/0x739 [76873.081420] [] ? get_parent_ip+0x9/0x1b [76873.081425] [] ? sub_preempt_count+0x83/0x94 [76873.081431] [] ? update_isolated_counts.isra.44+0x148/0x16e [76873.081438] [] ? shrink_inactive_list+0x2b1/0x446 [76873.081446] [] ? shrink_mem_cgroup_zone+0x371/0x480 [76873.081454] [] ? shrink_zone+0x62/0x9b [76873.081460] [] ? do_try_to_free_pages+0x1e4/0x434 [76873.081467] [] ? try_to_free_pages+0xb3/0xf9 [76873.081473] [] ? should_resched+0x5/0x23 [76873.081481] [] ? __alloc_pages_nodemask+0x4ef/0x7df [76873.081487] [] ? get_parent_ip+0x9/0x1b [76873.081497] [] ? alloc_pages_current+0xc7/0xe4 [76873.081510] [] ? iwlagn_rx_allocate+0x97/0x24d [iwlwifi] [76873.081521] [] ? iwlagn_rx_replenish+0x3a/0x3a [iwlwifi] [76873.081530] [] ? iwlagn_rx_replenish+0x18/0x3a [iwlwifi] [76873.081538] [] ? process_one_work+0x16d/0x298 [76873.081545] [] ? worker_thread+0xc2/0x145 [76873.081551] [] ? manage_workers.isra.23+0x15b/0x15b [76873.081556] [] ? kthread+0x7d/0x85 [76873.081562] [] ? kernel_thread_helper+0x4/0x10 [76873.081568] [] ? kthread_freezable_should_stop+0x37/0x37 [76873.081574] [] ? gs_change+0x13/0x13 [76873.081585] 192.168.205.3-m D ffff88021e293680 0 14532 2 0x00000080 [76873.081590] ffff880206d600c0 0000000000000046 ffff880186733e60 ffff88004b4230c0 [76873.081597] 0000000000013680 ffff880022305fd8 ffff880022305fd8 ffff880206d600c0 [76873.081603] ffff88021e5bb778 0000000000000000 ffff880206d600c0 ffffffffa069be59 [76873.081609] Call Trace: [76873.081625] [] ? rpc_make_runnable+0x6a/0x6a [sunrpc] [76873.081637] [] ? rpc_wait_bit_killable+0x59/0x6c [sunrpc] [76873.081642] [] ? __wait_on_bit+0x3e/0x71 [76873.081648] [] ? _raw_spin_unlock_irqrestore+0x30/0x3e [76873.081654] [] ? out_of_line_wait_on_bit+0x6f/0x78 [76873.081665] [] ? rpc_make_runnable+0x6a/0x6a [sunrpc] [76873.081671] [] ? autoremove_wake_function+0x2a/0x2a [76873.081690] [] ? nfs4_run_open_task+0x101/0x12e [nfs] [76873.081709] [] ? nfs4_open_recover_helper+0xbd/0x13f [nfs] [76873.081724] [] ? nfs4_open_recover+0x64/0x113 [nfs] [76873.081740] [] ? nfs4_open_expired+0x69/0xc4 [nfs] [76873.081761] [] ? nfs4_do_reclaim+0x109/0x4a0 [nfs] [76873.081779] [] ? nfs4_state_clear_reclaim_reboot.part.7+0xf6/0x10a [nfs] [76873.081797] [] ? nfs4_run_state_manager+0x363/0x52e [nfs] [76873.081814] [] ? nfs4_do_reclaim+0x4a0/0x4a0 [nfs] [76873.081819] [] ? kthread+0x7d/0x85 [76873.081825] [] ? kernel_thread_helper+0x4/0x10 [76873.081830] [] ? kthread_freezable_should_stop+0x37/0x37 [76873.081836] [] ? gs_change+0x13/0x13 [76873.081842] Sched Debug Version: v0.10, 3.4.4-amd64-preempt-noide-20120410 #1 -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/