Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp3135187imm; Tue, 29 May 2018 01:28:18 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqnsFcNEo4ZOF2kLZIzbskGto33dngSiT7YVYfudZxrgO9//J10Ur2/JVJbSNIvBvU0qygR X-Received: by 2002:a62:6105:: with SMTP id v5-v6mr16372987pfb.197.1527582498462; Tue, 29 May 2018 01:28:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527582498; cv=none; d=google.com; s=arc-20160816; b=OFzohLsViKtmXFuXEWyCs4LAA7F18oFN8d3mdcYTUjoOfdOt+O/R7x8LhQBwtvGHF5 SlSyaEfw3bhqJv9WRYzbD6WaPEUV3ZAGADJnyNYFPGJUnM9ak+/4/HCqZ7dNyqaaYxwt V39SRoBXA2V0wPELMbIfTnwi6PUjGlL0uo7ik58IJJS9wvKfqdOkHX2Tp6kFI7LT+Fq3 ZoC4IAqEgeUxMj9KxX/fgPyNld88/tIRmkiI+oKcta6wzmXJ5ASylw/GX5eGlFFUODn3 B2k89WS9LxpQ4qkFEUjIKUBeg42GJeueZ1M1PmYJRsbBPBERR+rqiA27Rd/3/i8Cz6+u UEzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=ETGOxHNGQsD3pSWCxQO9YVQdyJXjmo2PJERAa5Ozl2c=; b=dLyrc/myHcdgCI539Nye1M+ISQcamcIZ3+H3ECy3VODmATz7tVA9lOKOUArs2M2C19 j3mrBq49rdUp4uFuS7pmk5mBthjZ6DwFksTNn2gsHbEvhn0jwuUSBDhmx8GSMyDg6c3h QqhnTxt1vEQkITlleXsUVKJUn1j/q00qgLpHeAaM26V8HZqvK0mu7wKg25zIaRLAb/3Y /cvsthBN3qESVNPb0Bzj5YDDGj/lZDTocP6gw3JBzvCW55kJvMMFOYd9bjXS+T/s+W4o 0xDbpvIpK3/yiMllsdfvUzXswizE8HpiHcJzF7H3FBuzhVKHEqtdBK+LSfHUVYPGfBAU M6mw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f11-v6si25287767pgo.406.2018.05.29.01.28.03; Tue, 29 May 2018 01:28:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755334AbeE2I1a (ORCPT + 99 others); Tue, 29 May 2018 04:27:30 -0400 Received: from mail-pg0-f65.google.com ([74.125.83.65]:35599 "EHLO mail-pg0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755183AbeE2I0y (ORCPT ); Tue, 29 May 2018 04:26:54 -0400 Received: by mail-pg0-f65.google.com with SMTP id 15-v6so5939426pge.2; Tue, 29 May 2018 01:26:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ETGOxHNGQsD3pSWCxQO9YVQdyJXjmo2PJERAa5Ozl2c=; b=Y+V5Yi+IA2r8xU/cFU6i/QdFBP2W1Qe1n1K7xsJN3vB+ucqhHH8a1LrOCLi4AlN8Wu f8B1aiBIZkdVa4fVimZ0A3TmjDQHsEFydtnrDrhHxzCsV5xJ9HAoR+sECCAUIxVjmDly L5zU70jjxct5Xgbd06tyqcWMwdmWDHnC/5N+p5LKvsi4Cb9RvVidOeznZDvxO9NyJSVk 779cBeEcnsZapm512+QPMoH6N+Snm9skIsZ/ogHVUYrCpL5c+WBh8PNYcHEHwEqJFvqP 3nEWCf15Qqdp1N0dn+xkWGkarXxeVDMHDQXBwXUpIDE6XlLrwUqPxBmmfdCUlSq3INX0 XZgQ== X-Gm-Message-State: ALKqPwfN4ySrXg1DMTSDLA69Y9h7j64qT3Yoq+PwAaUyrGRd7SUbd1az Z1SoK8p9lmBtvClA9cTeB2k= X-Received: by 2002:a63:9741:: with SMTP id d1-v6mr13250569pgo.447.1527582413543; Tue, 29 May 2018 01:26:53 -0700 (PDT) Received: from tiehlicka.suse.cz (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id 63-v6sm56766162pgi.4.2018.05.29.01.26.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 May 2018 01:26:52 -0700 (PDT) From: Michal Hocko To: Jonathan Corbet Cc: Dave Chinner , Randy Dunlap , Mike Rapoport , LKML , , , Michal Hocko Subject: [PATCH v2] doc: document scope NOFS, NOIO APIs Date: Tue, 29 May 2018 10:26:44 +0200 Message-Id: <20180529082644.26192-1-mhocko@kernel.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180524114341.1101-1-mhocko@kernel.org> References: <20180524114341.1101-1-mhocko@kernel.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Michal Hocko Although the api is documented in the source code Ted has pointed out that there is no mention in the core-api Documentation and there are people looking there to find answers how to use a specific API. Changes since v1 - add kerneldoc for the api - suggested by Johnatan - review feedback from Dave and Johnatan - feedback from Dave about more general critical context rather than locking - feedback from Mike - typo fixed - Randy, Dave Requested-by: "Theodore Y. Ts'o" Signed-off-by: Michal Hocko --- .../core-api/gfp_mask-from-fs-io.rst | 61 +++++++++++++++++++ Documentation/core-api/index.rst | 1 + include/linux/sched/mm.h | 38 ++++++++++++ 3 files changed, 100 insertions(+) create mode 100644 Documentation/core-api/gfp_mask-from-fs-io.rst diff --git a/Documentation/core-api/gfp_mask-from-fs-io.rst b/Documentation/core-api/gfp_mask-from-fs-io.rst new file mode 100644 index 000000000000..2dc442b04a77 --- /dev/null +++ b/Documentation/core-api/gfp_mask-from-fs-io.rst @@ -0,0 +1,61 @@ +================================= +GFP masks used from FS/IO context +================================= + +:Date: May, 2018 +:Author: Michal Hocko + +Introduction +============ + +Code paths in the filesystem and IO stacks must be careful when +allocating memory to prevent recursion deadlocks caused by direct +memory reclaim calling back into the FS or IO paths and blocking on +already held resources (e.g. locks - most commonly those used for the +transaction context). + +The traditional way to avoid this deadlock problem is to clear __GFP_FS +respectively __GFP_IO (note the latter implies clearing the first as well) in +the gfp mask when calling an allocator. GFP_NOFS respectively GFP_NOIO can be +used as shortcut. It turned out though that above approach has led to +abuses when the restricted gfp mask is used "just in case" without a +deeper consideration which leads to problems because an excessive use +of GFP_NOFS/GFP_NOIO can lead to memory over-reclaim or other memory +reclaim issues. + +New API +======== + +Since 4.12 we do have a generic scope API for both NOFS and NOIO context +``memalloc_nofs_save``, ``memalloc_nofs_restore`` respectively ``memalloc_noio_save``, +``memalloc_noio_restore`` which allow to mark a scope to be a critical +section from a filesystem or I/O point of view. Any allocation from that +scope will inherently drop __GFP_FS respectively __GFP_IO from the given +mask so no memory allocation can recurse back in the FS/IO. + +FS/IO code then simply calls the appropriate save function before +any critical section with respect to the reclaim is started - e.g. +lock shared with the reclaim context or when a transaction context +nesting would be possible via reclaim. The restore function should be +called when the critical section ends. All that ideally along with an +explanation what is the reclaim context for easier maintenance. + +Please note that the proper pairing of save/restore functions +allows nesting so it is safe to call ``memalloc_noio_save`` or +``memalloc_noio_restore`` respectively from an existing NOIO or NOFS +scope. + +What about __vmalloc(GFP_NOFS) +============================== + +vmalloc doesn't support GFP_NOFS semantic because there are hardcoded +GFP_KERNEL allocations deep inside the allocator which are quite non-trivial +to fix up. That means that calling ``vmalloc`` with GFP_NOFS/GFP_NOIO is +almost always a bug. The good news is that the NOFS/NOIO semantic can be +achieved by the scope API. + +In the ideal world, upper layers should already mark dangerous contexts +and so no special care is required and vmalloc should be called without +any problems. Sometimes if the context is not really clear or there are +layering violations then the recommended way around that is to wrap ``vmalloc`` +by the scope API with a comment explaining the problem. diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index c670a8031786..8a5f48ef16f2 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -25,6 +25,7 @@ Core utilities genalloc errseq printk-formats + gfp_mask-from-fs-io Interfaces for kernel debugging =============================== diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index e1f8411e6b80..af5ba077bbc4 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -166,6 +166,17 @@ static inline void fs_reclaim_acquire(gfp_t gfp_mask) { } static inline void fs_reclaim_release(gfp_t gfp_mask) { } #endif +/** + * memalloc_noio_save - Marks implicit GFP_NOIO allocation scope. + * + * This functions marks the beginning of the GFP_NOIO allocation scope. + * All further allocations will implicitly drop __GFP_IO flag and so + * they are safe for the IO critical section from the allocation recursion + * point of view. Use memalloc_noio_restore to end the scope with flags + * returned by this function. + * + * This function is safe to be used from any context. + */ static inline unsigned int memalloc_noio_save(void) { unsigned int flags = current->flags & PF_MEMALLOC_NOIO; @@ -173,11 +184,30 @@ static inline unsigned int memalloc_noio_save(void) return flags; } +/** + * memalloc_noio_restore - Ends the implicit GFP_NOIO scope. + * @flags: Flags to restore. + * + * Ends the implicit GFP_NOIO scope started by memalloc_noio_save function. + * Always make sure that that the given flags is the return value from the + * pairing memalloc_noio_save call. + */ static inline void memalloc_noio_restore(unsigned int flags) { current->flags = (current->flags & ~PF_MEMALLOC_NOIO) | flags; } +/** + * memalloc_nofs_save - Marks implicit GFP_NOFS allocation scope. + * + * This functions marks the beginning of the GFP_NOFS allocation scope. + * All further allocations will implicitly drop __GFP_FS flag and so + * they are safe for the FS critical section from the allocation recursion + * point of view. Use memalloc_nofs_restore to end the scope with flags + * returned by this function. + * + * This function is safe to be used from any context. + */ static inline unsigned int memalloc_nofs_save(void) { unsigned int flags = current->flags & PF_MEMALLOC_NOFS; @@ -185,6 +215,14 @@ static inline unsigned int memalloc_nofs_save(void) return flags; } +/** + * memalloc_nofs_restore - Ends the implicit GFP_NOFS scope. + * @flags: Flags to restore. + * + * Ends the implicit GFP_NOFS scope started by memalloc_nofs_save function. + * Always make sure that that the given flags is the return value from the + * pairing memalloc_nofs_save call. + */ static inline void memalloc_nofs_restore(unsigned int flags) { current->flags = (current->flags & ~PF_MEMALLOC_NOFS) | flags; -- 2.17.0