Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp904364imu; Tue, 11 Dec 2018 09:19:36 -0800 (PST) X-Google-Smtp-Source: AFSGD/XwSfWJGUT7RADzzjR5EwLLIrXPNtLelRYVl4CLq7D4j6nuT044hhXVqXNxcJ1XqMBOI5qk X-Received: by 2002:a63:f412:: with SMTP id g18mr15480893pgi.262.1544548776447; Tue, 11 Dec 2018 09:19:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544548776; cv=none; d=google.com; s=arc-20160816; b=0LebcERdOS5qH7mwdRXciN8rUNVWSiX1T5r7gkfAvscMEbFtV6cazZVk892FrliUJi D4p8uZr/FvtZdaxbl9Duu7f0OJtxWm57T3+7Hfgowa+mkRRuOHo61oFUMT29sjZyoSg1 B1pLMz0Z6r2pbD5/PKdTzbBCxXlk80O3DYf3P6HACSAICBb53/zGHi+FLpd1UmQiGYrU mxgkhI24d54T+tPSyNtlw0oXH9dLXz6oHVISMiCKMIr7h8224nrUxvwk2MymjJ3E2npp jlSyqPfmUOuH3/NRfKI0Fjc7WaebxKcybdQdM7qcnpjUHTwRbszm7LxrglRJkODOAuov jO/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Sf8BK0xXrftzOstfj1SLVrnXZvomnK3Fph1XVwKvU9A=; b=wP0YGcxGICO4u421Qp1+Fwz4xCgZdMC0veryDfS5C8/cfBFud9SiCv09PcNi+G13Lf bRWPEi3/67+HTNGSCUNdrWsM5ZkGp0sEZvdryMk4iOa8T0HIByzSQayPssjNsUmW9IOS MJu5Z9ioQTNz3YGOfd2JMxe3/jD+yHIz6SEUVEYBx1v1d8bz7bEa0+uvK1lSavqgwX71 eLKZ3nU1syP+k2/FU7/s5RBtKKE2H6XXCUAv7y6jBqCeZPePsiuaaYUQCIvW4CyrzlQM ju+oIYl9Bsziendj7MZ4kPhCMumRcgOpLR6WQd8r+zEAgqNVVafvsRKMJJgpx6p040qR PMfg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z2si12492416pgs.267.2018.12.11.09.19.21; Tue, 11 Dec 2018 09:19:36 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729064AbeLKQVz (ORCPT + 99 others); Tue, 11 Dec 2018 11:21:55 -0500 Received: from mx2.suse.de ([195.135.220.15]:42124 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727062AbeLKQVy (ORCPT ); Tue, 11 Dec 2018 11:21:54 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id DF727B0AD; Tue, 11 Dec 2018 16:21:51 +0000 (UTC) Date: Tue, 11 Dec 2018 17:21:49 +0100 From: Michal Hocko To: "Kirill A. Shutemov" Cc: Andrew Morton , Liu Bo , Jan Kara , Dave Chinner , Theodore Ts'o , Johannes Weiner , Vladimir Davydov , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, LKML Subject: Re: [PATCH] mm, memcg: fix reclaim deadlock with writeback Message-ID: <20181211162149.GL1286@dhcp22.suse.cz> References: <20181211132645.31053-1-mhocko@kernel.org> <20181211151542.2rjti4glj75honje@kshutemo-mobl1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181211151542.2rjti4glj75honje@kshutemo-mobl1> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 11-12-18 18:15:42, Kirill A. Shutemov wrote: > On Tue, Dec 11, 2018 at 02:26:45PM +0100, Michal Hocko wrote: [...] > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -2993,6 +2993,17 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) > > struct vm_area_struct *vma = vmf->vma; > > vm_fault_t ret; > > > > + /* > > + * Preallocate pte before we take page_lock because this might lead to > > + * deadlocks for memcg reclaim which waits for pages under writeback. > > + */ > > + if (pmd_none(*vmf->pmd) && !vmf->prealloc_pte) { > > + vmf->prealloc_pte = pte_alloc_one(vmf->vma->vm>mm, vmf->address); > > + if (!vmf->prealloc_pte) > > + return VM_FAULT_OOM; > > + smp_wmb(); /* See comment in __pte_alloc() */ > > + } > > + > > ret = vma->vm_ops->fault(vmf); > > if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY | > > VM_FAULT_DONE_COW))) > > Sorry, but I don't think it fixes anything. Just hides it a level deeper. > > The trick with ->prealloc_pte works for faultaround because we can rely on > ->map_pages() to not sleep and we know how it will setup page table entry. > Basically, core controls most of the path. > > It's not the case with ->fault(). It is free to sleep and allocate > whatever it wants. Yeah, but if the fault callback wants to allocate then it has to consider the usual allocation restrictions. e.g. NOFS if the allocation itself can trip over fs locks. > For instance, DAX page fault will setup page table entry on its own and > return VM_FAULT_NOPAGE. It uses vmf_insert_mixed() to setup the page table > and ignores your pre-allocated page table. Does this happen with a page locked and with __GFP_ACCOUNT allocation. I am not familiar with that code but I do not see it from a quick look. > But it's just an example. The problem is that ->fault() is not bounded on > what it can do, unlike ->map_pages(). That is a fair point but the primary issue here is that the generic #PF code breaks the underlying assumption and performs __GFP_ACCOUNT|GFP_KERNEL allocation from within a fs owned locked page. -- Michal Hocko SUSE Labs