Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2098168imu; Wed, 12 Dec 2018 09:26:17 -0800 (PST) X-Google-Smtp-Source: AFSGD/X9iE5SVQkcg3grChR0IXS0f146+ke0TY9K/I4XEC8IPXT7qLSopiSNe2E3O5wY1GP5yxLZ X-Received: by 2002:a63:7cf:: with SMTP id 198mr521912pgh.129.1544635577110; Wed, 12 Dec 2018 09:26:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544635577; cv=none; d=google.com; s=arc-20160816; b=nRu1YwWis9e+WwaZH2PQEWz1WiNzH9PnUV5Gw3hKay/A2cITIp1bSKaWoDtU2rKo6+ C6JbO5S3Pi8Gxq27btFKJyTNOI5Ur/YKQ/Ff2oJ6hj0knZbdTa4RJNmRlzIAzrKt6Sn+ OTrFzCCge+O8DSevDCIp9DapgJSTCKKGAw8aJq1d7jGKy47GWB6TrQOazA3EfW43N9Nh rvdjgSl/Pobw4a65TfxxHl0VtI9NRpuPmB5oj9TC/okYFKiIyL41raXmtSPOYQUk6g9A KxVF54/0693Mvol9ZVYGB9jr9UNBdldoxWf9GFyN5U2AgF680/PsMzu1wEnM/11nl9eH yOuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=Aj16qc+993hcx/+GJx4fYCcJFsU2nVBVBFn3y+WxdUE=; b=osbSpECBAEZTOAR5bniTjunxg3VKffhUK4rZ26rng9B5NvAugJr9CFjE9Wlyc0XI7v sqr0Bn5NOWwPY98NUrnQXCkV2xugmiEpC1wpqgIlw/XD7eqz5fzQ1jngF9uMEWDvf4E1 KIZqMI3XLzQy+/7I7K2LGmSMvpseAW7pKfPQrpsU8cdJSaBsxyBr7fCM8/jPVIFafrg+ qohWQOP/ySCxY9kv0NXh9WhHRhqNV/JGyfBL0bL/JdZ5QO8ZzCLQ1V6Y0VsfuOQH+hHZ BBHReR3qcN8HfWA4fB2wB8efy84eIyiva+gNI2OnOwzOpUMLEaqk8URVPyTbpPhEt+Ho tk/A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=V1KhKjEH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a5si15032426pgg.120.2018.12.12.09.26.00; Wed, 12 Dec 2018 09:26:17 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=V1KhKjEH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727917AbeLLRY6 (ORCPT + 99 others); Wed, 12 Dec 2018 12:24:58 -0500 Received: from mail-yw1-f65.google.com ([209.85.161.65]:45494 "EHLO mail-yw1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727799AbeLLRY6 (ORCPT ); Wed, 12 Dec 2018 12:24:58 -0500 Received: by mail-yw1-f65.google.com with SMTP id d190so7283260ywd.12 for ; Wed, 12 Dec 2018 09:24:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Aj16qc+993hcx/+GJx4fYCcJFsU2nVBVBFn3y+WxdUE=; b=V1KhKjEHkMFmym+6Xe2kuXydGeShubZcJNeZEoIm0gE5y7w25MGUVl4CfGU4jeQ/kG 69veFvzpLVGm2DrENcgI2FRZjY67LDEdydyqB2JTokZ2t22ayCDVQDGjEnfDSAzQMHUf i7qtzE7NgctH8udlfbi+OT2SrvLTyzZBzCwQJNnGw8D8mcKY2eBZxYYu4cMB1Wo8R8TZ NV3Mcc3Hof9Mn0l/fNselNcPhRd42WrTVadxm1S9RDHwXjLIGVp2xSeGNXEr3YyMPayA sgoFjpN8Rrx9AWg1rruiI7AXz8opKYrU15L2N+kyntJLrzk4jLJl2L35IRi/qQJxpkqK MtaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Aj16qc+993hcx/+GJx4fYCcJFsU2nVBVBFn3y+WxdUE=; b=q8nVSBYwyPIyaaDy5mlskMmoBp5l+XYxxavOmqYvrEeog64r3MWcAbhvIirz6BlvVW tFZDUYmNECXGPz6vB4MXwfq53QstTg733titpAFO9D49cEfPy/4QElwcmkdNUUVu3xUe Dv/dqDA9e4SdUiU476JqQubigy87Qane2H4m60Mp8wza9M+En+4yXe5qc7bv/13oUuA/ uVxoeQHDqHmsv6kPhgyAoEjuxDyrJvVUV6nhzaI0yyh5wUfPF7BAntmLNgKG7gF4WW2P iXu+tGimH/YoVk529n+E4QOHMIGisDnMvmBPA5ezNiubKM7vJ25ldhXC/d1EJLAoIVjR 5Jaw== X-Gm-Message-State: AA+aEWYFE3cB2Ip7aiqw+sx5G68olVvCX8O6JGnaqY7ciwLJ4iWpe+tr 2Kbr2049BQOx+EH5+t61pazMj8wefCoNRXNiEWxSNgbuftE/iA== X-Received: by 2002:a0d:cb4c:: with SMTP id n73mr21025993ywd.255.1544635496965; Wed, 12 Dec 2018 09:24:56 -0800 (PST) MIME-Version: 1.0 References: <20181211132645.31053-1-mhocko@kernel.org> <20181212155055.1269-1-mhocko@kernel.org> In-Reply-To: <20181212155055.1269-1-mhocko@kernel.org> From: Shakeel Butt Date: Wed, 12 Dec 2018 09:24:45 -0800 Message-ID: Subject: Re: [PATCH v2] mm, memcg: fix reclaim deadlock with writeback To: Michal Hocko Cc: Andrew Morton , "Kirill A. Shutemov" , bo.liu@linux.alibaba.com, Jan Kara , david@fromorbit.com, tytso@mit.edu, Johannes Weiner , Vladimir Davydov , Linux MM , linux-fsdevel , LKML , Michal Hocko Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 12, 2018 at 7:51 AM Michal Hocko wrote: > > From: Michal Hocko > > Liu Bo has experienced a deadlock between memcg (legacy) reclaim and the > ext4 writeback > task1: > [] wait_on_page_bit+0x82/0xa0 > [] shrink_page_list+0x907/0x960 > [] shrink_inactive_list+0x2c7/0x680 > [] shrink_node_memcg+0x404/0x830 > [] shrink_node+0xd8/0x300 > [] do_try_to_free_pages+0x10d/0x330 > [] try_to_free_mem_cgroup_pages+0xd5/0x1b0 > [] try_charge+0x14d/0x720 > [] memcg_kmem_charge_memcg+0x3c/0xa0 > [] memcg_kmem_charge+0x7e/0xd0 > [] __alloc_pages_nodemask+0x178/0x260 > [] alloc_pages_current+0x95/0x140 > [] pte_alloc_one+0x17/0x40 > [] __pte_alloc+0x1e/0x110 > [] alloc_set_pte+0x5fe/0xc20 > [] do_fault+0x103/0x970 > [] handle_mm_fault+0x61e/0xd10 > [] __do_page_fault+0x252/0x4d0 > [] do_page_fault+0x30/0x80 > [] page_fault+0x28/0x30 > [] 0xffffffffffffffff > > task2: > [] __lock_page+0x86/0xa0 > [] mpage_prepare_extent_to_map+0x2e7/0x310 [ext4] > [] ext4_writepages+0x479/0xd60 > [] do_writepages+0x1e/0x30 > [] __writeback_single_inode+0x45/0x320 > [] writeback_sb_inodes+0x272/0x600 > [] __writeback_inodes_wb+0x92/0xc0 > [] wb_writeback+0x268/0x300 > [] wb_workfn+0xb4/0x390 > [] process_one_work+0x189/0x420 > [] worker_thread+0x4e/0x4b0 > [] kthread+0xe6/0x100 > [] ret_from_fork+0x41/0x50 > [] 0xffffffffffffffff > > He adds > : task1 is waiting for the PageWriteback bit of the page that task2 has > : collected in mpd->io_submit->io_bio, and tasks2 is waiting for the LOCKED > : bit the page which tasks1 has locked. > > More precisely task1 is handling a page fault and it has a page locked > while it charges a new page table to a memcg. That in turn hits a memory > limit reclaim and the memcg reclaim for legacy controller is waiting on > the writeback but that is never going to finish because the writeback > itself is waiting for the page locked in the #PF path. So this is > essentially ABBA deadlock. > > Waiting for the writeback in legacy memcg controller is a workaround > for pre-mature OOM killer invocations because there is no dirty IO > throttling available for the controller. There is no easy way around > that unfortunately. Therefore fix this specific issue by pre-allocating > the page table outside of the page lock. We have that handy > infrastructure for that already so simply reuse the fault-around pattern > which already does this. Michal, can you please add the following para in the commit message as well which was in the first version. This fact should be documented at least in the commit message. > > There are probably other hidden __GFP_ACCOUNT | GFP_KERNEL allocations > from under a fs page locked but they should be really rare. I am not > aware of a better solution unfortunately. > thanks, Shakeel