Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935844AbcLOLyo (ORCPT ); Thu, 15 Dec 2016 06:54:44 -0500 Received: from extserverfr1.prnet.org ([188.165.208.21]:42760 "EHLO extserverfr1.prnet.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755701AbcLOLxN (ORCPT ); Thu, 15 Dec 2016 06:53:13 -0500 From: admin To: Xin Zhou Cc: Michal Hocko , linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, David Sterba , Chris Mason Subject: Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another In-Reply-To: Message-ID: <20161215115232.Horde.NFLyVFrw2pKqM_WBO_YP3Nt@secure.prnet.org> User-Agent: Horde Application Framework 5 Date: Thu, 15 Dec 2016 11:52:32 +0000 Content-Type: text/plain; charset=utf-8 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5477 Lines: 109 Hi, The source is a software raid 5 (md) of 4x 4TB Western Digital RE4 disks. The destinations is a hardware raid 5 enclosure containing 4x 8TB Seagate Archival disks connected using e-sata. I am currently trying Duncans suggestions. With them, the page allocation stall doesn't seem to appear and overall system responsiveness is also much better during copying. Thanks, David Arendt Xin Zhou – Thu., 15. December 2016 0:24 > Hi, > > The dirty data is in large amount, probably unable to commit to disk. > And this seems to happen when copying from 7200rpm to 5600rpm disks, according to previous post. > > Probably the I/Os are buffered and pending, unable to get finished in-time. > It might be helpful to know if this only happens for specific types of 5600 rpm disks? > > And are these disks on RAID groups? Thanks. > Xin > > > > Sent: Wednesday, December 14, 2016 at 3:38 AM > From: admin > To: "Michal Hocko" > Cc: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, "David Sterba" , "Chris Mason" > Subject: Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another > Hi, > > I verified the log files and see no prior oom killer invocation. Unfortunately the machine has been rebooted since. Next time it happens, I will also look in dmesg. > > Thanks, > David Arendt > > > Michal Hocko – Wed., 14. December 2016 11:31 > > Btw. the stall should be preceded by the OOM killer invocation. Could > > you share the OOM report please. I am asking because such an OOM killer > > would be clearly pre-mature as per your meminfo. I am trying to change > > that code and seeing your numbers might help me. > > > > Thanks! > > > > On Wed 14-12-16 11:17:43, Michal Hocko wrote: > > > On Tue 13-12-16 18:11:01, David Arendt wrote: > > > > Hi, > > > > > > > > I receive the following page allocation stall while copying lots of > > > > large files from one btrfs hdd to another. > > > > > > > > Dec 13 13:04:29 server kernel: kworker/u16:8: page allocation stalls for 12260ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL) > > > > Dec 13 13:04:29 server kernel: CPU: 0 PID: 24959 Comm: kworker/u16:8 Tainted: P O 4.9.0 #1 > > > [...] > > > > Dec 13 13:04:29 server kernel: Call Trace: > > > > Dec 13 13:04:29 server kernel: [] ? dump_stack+0x46/0x5d > > > > Dec 13 13:04:29 server kernel: [] ? warn_alloc+0x111/0x130 > > > > Dec 13 13:04:33 server kernel: [] ? __alloc_pages_nodemask+0xbe8/0xd30 > > > > Dec 13 13:04:33 server kernel: [] ? pagecache_get_page+0xe4/0x230 > > > > Dec 13 13:04:33 server kernel: [] ? alloc_extent_buffer+0x10b/0x400 > > > > Dec 13 13:04:33 server kernel: [] ? btrfs_alloc_tree_block+0x125/0x560 > > > > > > OK, so this is > > > find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL) > > > > > > The main question is whether this really needs to be NOFS request... > > > > > > > Dec 13 13:04:33 server kernel: [] ? read_extent_buffer_pages+0x21f/0x280 > > > > Dec 13 13:04:33 server kernel: [] ? __btrfs_cow_block+0x141/0x580 > > > > Dec 13 13:04:33 server kernel: [] ? btrfs_cow_block+0x100/0x150 > > > > Dec 13 13:04:33 server kernel: [] ? btrfs_search_slot+0x1e9/0x9c0 > > > > Dec 13 13:04:33 server kernel: [] ? __set_extent_bit+0x512/0x550 > > > > Dec 13 13:04:33 server kernel: [] ? lookup_inline_extent_backref+0xf5/0x5e0 > > > > Dec 13 13:04:34 server kernel: [] ? set_extent_bit+0x24/0x30 > > > > Dec 13 13:04:34 server kernel: [] ? update_block_group.isra.34+0x114/0x380 > > > > Dec 13 13:04:34 server kernel: [] ? __btrfs_free_extent.isra.35+0xf4/0xd20 > > > > Dec 13 13:04:34 server kernel: [] ? btrfs_merge_delayed_refs+0x61/0x5d0 > > > > Dec 13 13:04:34 server kernel: [] ? __btrfs_run_delayed_refs+0x902/0x10a0 > > > > Dec 13 13:04:34 server kernel: [] ? btrfs_run_delayed_refs+0x90/0x2a0 > > > > Dec 13 13:04:34 server kernel: [] ? delayed_ref_async_start+0x84/0xa0 > > > > > > What would cause the reclaim recursion? > > > > > > > Dec 13 13:04:34 server kernel: Mem-Info: > > > > Dec 13 13:04:34 server kernel: active_anon:20 inactive_anon:34 > > > > isolated_anon:0\x0a active_file:7370032 inactive_file:450105 > > > > isolated_file:320\x0a unevictable:0 dirty:522748 writeback:189 > > > > unstable:0\x0a slab_reclaimable:178255 slab_unreclaimable:124617\x0a > > > > mapped:4236 shmem:0 pagetables:1163 bounce:0\x0a free:38224 free_pcp:241 > > > > free_cma:0 > > > > > > This speaks for itself. There is a lot of dirty data, basically no > > > anonymous memory and GFP_NOFS cannot do much to reclaim obviously. This > > > is either a configuraion bug as somebody noted down the thread (setting > > > the dirty_ratio) or suboptimality of the btrfs code which might request > > > NOFS even though it is not strictly necessary. This would be more for > > > btrfs developers. > > > -- > > > Michal Hocko > > > SUSE Labs > > > > -- > > Michal Hocko > > SUSE Labs > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at vger.kernel.org/majordomo-info.html