Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1821918imu; Fri, 14 Dec 2018 00:52:37 -0800 (PST) X-Google-Smtp-Source: AFSGD/XGrk6rFvd/5swFH/mVrbPCUqjC5sp4QhDjXRs6hv9VJN7QezVRcsU2QsbKWhsvOJACNmB/ X-Received: by 2002:aa7:8608:: with SMTP id p8mr2047365pfn.125.1544777557732; Fri, 14 Dec 2018 00:52:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544777557; cv=none; d=google.com; s=arc-20160816; b=ArLXOhbEq5Fx6j5WyZJSrQAZRYiXRxzAthmj07n5t9OCNN7Gp82Vy4XQ3VuVEW5vOw 6dO4Bkb8GUqEOcRzZsIqFsPHEe7CVzpplnf6D8M6KH9q4gE5EillD+8F/9VazCgK9a9b vKBPiqjLuua1GnIYxzELDjKqibVg0VItaBp3lsdB/RDxpDreTXGoF5RaSvdA9tGGRICj 1RiK9iachBmSoHIYZW67L4pTtn+LrmKq0GiwCVMEs69iDBfDxaJsEsHglL6HJhSah04b Xyk6GtQOTYI7DkigqB3mcgDH176RC/5Xb2xdJjT3yNuNBpo/d2soFrtgICrt5pie9X4c l6uw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=H2FbrpKym+tEoFZa6jSDAGAdKlXkDmnnHuc4/EvAHnY=; b=l7BMSqhZbrBAZyWjgu+40Ta1Vqy1oeINt+tI1iQajhljamHk4N8HqymhuxMzQG4Cms JUwLodLWzF2f0/5vyzEzc3YCGBzGSVQsDzXtycMecrmsyynwe3AVPQr5YVWPciSMth8z rdO4BA6K2m0WNLcBFqxrwbVke/2oR9QOnwfdlaC+hnM+SHqkcNXfZAGMchi36dF3DAzt RJk1IuYft+vvF7CHLIxB0WsvB4lcmV2+lHCJb7fxh31MoT2PbuRWtcoF1SciFO8fzZ6k FTK0mwAa0aIv/4wCoZR2JmjJL3Qt2fc2Us1hmi9cxrRFEpxEw6bDOzEBwEJgCDz283Qt 0vGw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d8si3865819pln.128.2018.12.14.00.52.22; Fri, 14 Dec 2018 00:52:37 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728207AbeLNItx (ORCPT + 99 others); Fri, 14 Dec 2018 03:49:53 -0500 Received: from mx2.suse.de ([195.135.220.15]:40252 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726494AbeLNItw (ORCPT ); Fri, 14 Dec 2018 03:49:52 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 8B023AD02; Fri, 14 Dec 2018 08:49:50 +0000 (UTC) Date: Fri, 14 Dec 2018 09:49:48 +0100 From: Michal Hocko To: Johannes Weiner Cc: Andrew Morton , "Kirill A. Shutemov" , Liu Bo , Jan Kara , Dave Chinner , Theodore Ts'o , Vladimir Davydov , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, LKML , Shakeel Butt , Stable tree Subject: Re: [PATCH v3] mm, memcg: fix reclaim deadlock with writeback Message-ID: <20181214084948.GA5624@dhcp22.suse.cz> References: <20181212155055.1269-1-mhocko@kernel.org> <20181213092221.27270-1-mhocko@kernel.org> <20181213220400.GA9829@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181213220400.GA9829@cmpxchg.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 13-12-18 17:04:00, Johannes Weiner wrote: [...] > Acked-by: Johannes Weiner Thanks! > Just one nit: > > > @@ -2993,6 +2993,17 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) > > struct vm_area_struct *vma = vmf->vma; > > vm_fault_t ret; > > > > + /* > > + * Preallocate pte before we take page_lock because this might lead to > > + * deadlocks for memcg reclaim which waits for pages under writeback. > > + */ > > + if (pmd_none(*vmf->pmd) && !vmf->prealloc_pte) { > > + vmf->prealloc_pte = pte_alloc_one(vmf->vma->vm_mm, vmf->address); > > + if (!vmf->prealloc_pte) > > + return VM_FAULT_OOM; > > + smp_wmb(); /* See comment in __pte_alloc() */ > > + } > > Could you be more specific in the deadlock comment? git blame will > work fine for a while, but it becomes a pain to find corresponding > patches after stuff gets moved around for years. > > In particular the race diagram between reclaim with a page lock held > and the fs doing SetPageWriteback batches before kicking off IO would > be useful directly in the code, IMO. This? diff --git a/mm/memory.c b/mm/memory.c index bb78e90a9b70..ece221e4da6d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2995,7 +2995,18 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) /* * Preallocate pte before we take page_lock because this might lead to - * deadlocks for memcg reclaim which waits for pages under writeback. + * deadlocks for memcg reclaim which waits for pages under writeback: + * lock_page(A) + * SetPageWriteback(A) + * unlock_page(A) + * lock_page(B) + * lock_page(B) + * pte_alloc_pne + * shrink_page_list + * wait_on_page_writeback(A) + * SetPageWriteback(B) + * unlock_page(B) + * # flush A, B to clear the writeback */ if (pmd_none(*vmf->pmd) && !vmf->prealloc_pte) { vmf->prealloc_pte = pte_alloc_one(vmf->vma->vm_mm, vmf->address); -- Michal Hocko SUSE Labs