Received: by 10.192.165.148 with SMTP id m20csp409050imm; Wed, 9 May 2018 15:04:45 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrdwQkvefiSrurU5V5C8u0Lnl/jW6Kyw4cxlGBn4rAFRyx2m7WBx9VsKpA3+IEgm6dkImAZ X-Received: by 10.98.50.198 with SMTP id y189mr40042106pfy.241.1525903484955; Wed, 09 May 2018 15:04:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525903484; cv=none; d=google.com; s=arc-20160816; b=Eouq516Wlnbw0ixtLZuUmfHX2p8lt3/k20xpgtk3UsPFVjwTDqU/m+B3vyZijdRySR 8PzT9Smd4XqHxHWDw1315lIw0kHXMRm5SVS4x2vU1gMeMBODxWAGxAp07EKYD0ociMhE /0M1/aiW5HdXFnX76xbfqgJnXSauSFOECeolhCQpDtaKlz9pZAEGTrD4inpqVqdYaqNz L8NNS+BRSa5Md2gbJ+E5+SpK5Dtjyh8TJqY2Vu/EngQzAbxEaaZDhsft7AAprOcqNqIZ dSR9AJBWgOB+Tlu7XnZ7+NBGYDdm58oFcBNBxjxr+1YcEhmD5czLWf89opL4jkGTEKe8 M0iw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=YOQ/B6x7lLlCZ7g3n1k8RbhqUkleEMSMSagHlkwebSw=; b=bhKpzHyUWKe8vGTp5z9ar2ZX+fcQ3NF8WhXjsjXfUqWvqW9pXIi6G7x6y2aNM6F819 3TUisID6rcI2Uvl0mWZslWknsb4a0Wl3SeKIRz9KV35jv7Xt34sLMXU7X/jt7uFg0ieJ OEOyfCYlumAChFAijrsx6PS6dLwB5wREDltfVjNOY5SRYDUiLNX0cCygfn9x9nP6FPxH 7bmxeGiEtWPsi/bswaiURkroMt9RkbMP54IPjFW+E2EE1RJBd3vMyny//hKhLx36Ox5Q f0rD/Jq/i66GFJWJV+7kf2IOfZUXQM/OAP70UvIYW7uFfqMVCwMOVXd678WxaMBLhTGl zFpg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=eY1mUDey; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e5-v6si3742594pgs.317.2018.05.09.15.04.30; Wed, 09 May 2018 15:04:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=eY1mUDey; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965835AbeEIWEL (ORCPT + 99 others); Wed, 9 May 2018 18:04:11 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:47040 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935307AbeEIWEH (ORCPT ); Wed, 9 May 2018 18:04:07 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w49M1IEM163571; Wed, 9 May 2018 22:02:39 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=corp-2017-10-26; bh=YOQ/B6x7lLlCZ7g3n1k8RbhqUkleEMSMSagHlkwebSw=; b=eY1mUDeyN5GMy9QChpBNUuUoHAtECDL31/fVJ1oHQHttl+jRBSB5r76hwdmZU0lQKjj+ 5/qVBUfQjc+2E0Wa6KQRUnyaeK7d2IpHpzsD3OL5oAHXndTCHx+ajXNOAYVV+ERRBauv 0r4kupyC2p6vyyCLyfxHv/upvMFSPxqW5Aiu2VyYL8YDtreQqX3qipRwyl7TmAAirH/K 9JJ4wWRKksc0jW2CpI/RcUXfnFutqkq/35r4ACEUSxQEQXDW3kZPAQ2Bmg1bu/Q56uvW lBtAn8ngNesQfU6trv0YJovvXqeuDDgAXTbQVGayRuKyFUX5qNGnewKUTpr34IMLAfAx tg== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp2120.oracle.com with ESMTP id 2hv6kp0ujd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 09 May 2018 22:02:39 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w49M2buU017051 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 9 May 2018 22:02:37 GMT Received: from abhmp0002.oracle.com (abhmp0002.oracle.com [141.146.116.8]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w49M2YsS005803; Wed, 9 May 2018 22:02:34 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 09 May 2018 15:02:34 -0700 Date: Wed, 9 May 2018 15:02:31 -0700 From: "Darrick J. Wong" To: Michal Hocko Cc: "Theodore Y. Ts'o" , LKML , Artem Bityutskiy , Richard Weinberger , David Woodhouse , Brian Norris , Boris Brezillon , Marek Vasut , Cyrille Pitchen , Andreas Dilger , Steven Whitehouse , Bob Peterson , Trond Myklebust , Anna Schumaker , Adrian Hunter , Philippe Ombredanne , Kate Stewart , Mikulas Patocka , linux-mtd@lists.infradead.org, linux-ext4@vger.kernel.org, cluster-devel@redhat.com, linux-nfs@vger.kernel.org, linux-mm@kvack.org Subject: Re: vmalloc with GFP_NOFS Message-ID: <20180509220231.GD25312@magnolia> References: <20180424162712.GL17484@dhcp22.suse.cz> <20180424183536.GF30619@thunk.org> <20180424192542.GS17484@dhcp22.suse.cz> <20180509134222.GU32366@dhcp22.suse.cz> <20180509151351.GA4111@magnolia> <20180509210447.GX32366@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180509210447.GX32366@dhcp22.suse.cz> User-Agent: Mutt/1.9.4 (2018-02-28) X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8888 signatures=668698 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1805090205 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 09, 2018 at 11:04:47PM +0200, Michal Hocko wrote: > On Wed 09-05-18 08:13:51, Darrick J. Wong wrote: > > On Wed, May 09, 2018 at 03:42:22PM +0200, Michal Hocko wrote: > > > On Tue 24-04-18 13:25:42, Michal Hocko wrote: > > > [...] > > > > > As a suggestion, could you take > > > > > documentation about how to convert to the memalloc_nofs_{save,restore} > > > > > scope api (which I think you've written about e-mails at length > > > > > before), and put that into a file in Documentation/core-api? > > > > > > > > I can. > > > > > > Does something like the below sound reasonable/helpful? > > > --- > > > ================================= > > > GFP masks used from FS/IO context > > > ================================= > > > > > > :Date: Mapy, 2018 > > > :Author: Michal Hocko > > > > > > Introduction > > > ============ > > > > > > FS resp. IO submitting code paths have to be careful when allocating > > > > Not sure what 'FS resp. IO' means here -- 'FS and IO' ? > > > > (Or is this one of those things where this looks like plain English text > > but in reality it's some sort of markup that I'm not so familiar with?) > > > > Confused because I've seen 'resp.' used as shorthand for > > 'responsible'... > > Well, I've tried to cover both. Filesystem and IO code paths which > allocate while in sensitive context. IO submission is kinda clear but I > am not sure what a general term for filsystem code paths would be. I > would be greatful for any hints here. "Code paths in the filesystem and IO stacks must be careful when allocating memory to prevent recursion deadlocks caused by direct memory reclaim calling back into the FS or IO paths and blocking on already held resources (e.g. locks)." ? --D > > > > > > memory to prevent from potential recursion deadlocks caused by direct > > > memory reclaim calling back into the FS/IO path and block on already > > > held resources (e.g. locks). Traditional way to avoid this problem > > > > 'The traditional way to avoid this deadlock problem...' > > Done > > > > is to clear __GFP_FS resp. __GFP_IO (note the later implies clearing > > > the first as well) in the gfp mask when calling an allocator. GFP_NOFS > > > resp. GFP_NOIO can be used as shortcut. > > > > > > This has been the traditional way to avoid deadlocks since ages. It > > > > I think this sentence is a little redundant with the previous sentence, > > you could chop it out and join this paragraph to the one before it. > > OK > > > > > > turned out though that above approach has led to abuses when the restricted > > > gfp mask is used "just in case" without a deeper consideration which leads > > > to problems because an excessive use of GFP_NOFS/GFP_NOIO can lead to > > > memory over-reclaim or other memory reclaim issues. > > > > > > New API > > > ======= > > > > > > Since 4.12 we do have a generic scope API for both NOFS and NOIO context > > > ``memalloc_nofs_save``, ``memalloc_nofs_restore`` resp. ``memalloc_noio_save``, > > > ``memalloc_noio_restore`` which allow to mark a scope to be a critical > > > section from the memory reclaim recursion into FS/IO POV. Any allocation > > > from that scope will inherently drop __GFP_FS resp. __GFP_IO from the given > > > mask so no memory allocation can recurse back in the FS/IO. > > > > > > FS/IO code then simply calls the appropriate save function right at > > > the layer where a lock taken from the reclaim context (e.g. shrinker) > > > is taken and the corresponding restore function when the lock is > > > released. All that ideally along with an explanation what is the reclaim > > > context for easier maintenance. > > > > > > What about __vmalloc(GFP_NOFS) > > > ============================== > > > > > > vmalloc doesn't support GFP_NOFS semantic because there are hardcoded > > > GFP_KERNEL allocations deep inside the allocator which are quit non-trivial > > > > ...which are quite non-trivial... > > fixed > > > > to fix up. That means that calling ``vmalloc`` with GFP_NOFS/GFP_NOIO is > > > almost always a bug. The good news is that the NOFS/NOIO semantic can be > > > achieved by the scope api. > > > > > > In the ideal world, upper layers should already mark dangerous contexts > > > and so no special care is required and vmalloc should be called without > > > any problems. Sometimes if the context is not really clear or there are > > > layering violations then the recommended way around that is to wrap ``vmalloc`` > > > by the scope API with a comment explaining the problem. > > > > Otherwise looks ok to me based on my understanding of how all this is > > supposed to work... > > > > Reviewed-by: Darrick J. Wong > > Thanks for your review! > > -- > Michal Hocko > SUSE Labs