Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E8D1C64EC4 for ; Thu, 9 Mar 2023 15:16:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231657AbjCIPQN (ORCPT ); Thu, 9 Mar 2023 10:16:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231211AbjCIPPj (ORCPT ); Thu, 9 Mar 2023 10:15:39 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CAE2B30DF for ; Thu, 9 Mar 2023 07:14:57 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 86042B81F6E for ; Thu, 9 Mar 2023 15:14:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 03659C433D2; Thu, 9 Mar 2023 15:14:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1678374895; bh=C4w5vmfcLPBn4U6yuKEqRWXaXPOuMzm9bLGZsROhwbc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=qeIltWNziJV+7bwnl4rl5/rCSOBMCwEOvbEY/fJTksRb6IALA/kD8+j1i7mJyG3To 6g19+r0gBeRGPa0AiBhYrzV0eOPxlw58pCTc4xSP4UUUNVnEJNa/iNJMmJZ89p3cZk 2oYky2tsBVSbWajCn0Btcs+H3kon1KIZMDCw+VoyvstEqLLeqgT2nJFIv4diYTf2cc yA3ScUZsCXDISpPdsirXBZNO6xdBUSXwM7BpRlBGCKMgOCokUxTIHD8ZTa78/qixNQ DJEkNyY8zoxiQLcYTRd/q5KXhNSvnu9mWZpL7OWtHu7VfNIJiFCvQoOTuafjSbb47B DESiBNQMs5eVA== Date: Thu, 9 Mar 2023 17:14:40 +0200 From: Mike Rapoport To: "Edgecombe, Rick P" Cc: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "peterz@infradead.org" , "tglx@linutronix.de" , "song@kernel.org" , "dave.hansen@linux.intel.com" , "vbabka@suse.cz" , "x86@kernel.org" , "akpm@linux-foundation.org" Subject: Re: [RFC PATCH 0/5] Prototype for direct map awareness in page allocator Message-ID: References: <20230308094106.227365-1-rppt@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 09, 2023 at 01:59:00AM +0000, Edgecombe, Rick P wrote: > On Wed, 2023-03-08 at 11:41 +0200, Mike Rapoport wrote: > > From: "Mike Rapoport (IBM)" > > > > Hi, > > > > This is a third attempt to make page allocator aware of the direct > > map > > layout and allow grouping of the pages that must be unmapped from > > the direct map. > > > > This a new implementation of __GFP_UNMAPPED, kinda a follow up for > > this set: > > > > https://lore.kernel.org/all/20220127085608.306306-1-rppt@kernel.org > > > > but instead of using a migrate type to cache the unmapped pages, the > > current implementation adds a dedicated cache to serve __GFP_UNMAPPED > > allocations. > > It seems a downside to having a page allocator outside of _the_ page > allocator is you don't get all of the features that are baked in there. > For example does secretmem care about numa? I guess in this > implementation there is just one big cache for all nodes. > > Probably most users would want __GFP_ZERO. Would secretmem care about > __GFP_ACCOUNT? The intention was that the pages in cache are always zeroed, so __GFP_ZERO is always implicitly there, at least should have been. __GFP_ACCOUNT is respected in this implementation. If you look at the changes to __alloc_pages(), after getting pages from unmapped cache there is 'goto out' to the point where the accounting is handled. > I'm sure there is more, but I guess the question is, is > the idea that these features all get built into unmapped-alloc at some > point? The alternate approach is to have little caches for each usage > like the grouped pages, which is probably less efficient when you have > a bunch of them. Or solve it just for modules like the bpf allocator. > Those are the tradeoffs for the approaches that have been explored, > right? I think that no matter what cache we'll use it won't be able to support all features _the_ page allocator has. Indeed if we'd have per case cache implementation we can tune that implementation to support features of interest for that use case, but then we'll be less efficient in reducing splits of the large pages. Not to mention increase in complexity as there will be several caches doing similar but yet different things. This POC mostly targets secretmem and modules, so this was pretty much about GFP_KERNEL without considerations for NUMA, but I think extending this unmapped alloc for NUMA should be simple enough but it will increase memory overhead even more. -- Sincerely yours, Mike.