Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp3016097rwl; Mon, 27 Mar 2023 08:11:14 -0700 (PDT) X-Google-Smtp-Source: AKy350ZOFUhArTsVRKmb+jQBrCem+7SLnbuTiyZ2YexYJq7s4bizofw2XaVfGaxAgAKPBK5LBeCm X-Received: by 2002:aa7:cb87:0:b0:501:d2eb:6b2c with SMTP id r7-20020aa7cb87000000b00501d2eb6b2cmr12852530edt.15.1679929874160; Mon, 27 Mar 2023 08:11:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679929874; cv=none; d=google.com; s=arc-20160816; b=STSErZO6PUChDmXnEH9TuIMGnPxhnnSSXlNaAsYxBfwpfWf5r7XBNcDDG7jiPcZ+c5 CH1YkePPtyhN+d2StNlJq3TqJNGKponZW79unF+b5qTbHc2S0ND41N/bp2Uyt8j3GVUM cTbmr0ygFyAYgo3sOQ1lAE7PQZpRKupBfqblj1da2BTgbHAt+EjQiY3jFF8PkjUmf34S 3vbZ6wU4VVZuFqxnx+duSNNytCAOJ/UjcbtPA4R7l462/vxrlr2mM0HXgSV4gdMN+Lef SIwNiez1pjpCOFxdWGcDtSe9x54NBMuH7FK+NfRQIxEqIR6OiNlfUGEjKEK2PbijtYjy Js3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=wumnRnZ3mJK+95JSA6PLaHw4mVwxf/aNZqTtEDsqwmM=; b=zpOEFGOVkZ2axgTiOGvXKHbBLWQV6d1J+BMN+MkbqSthFx2rOD5aEcl+tlCuh0F5go 0MwUsPT0p5aDzKufW+isfBQ9d06v2dbL2wHt6eoiWKFySApz2xvrGdRJ2LX+CkeIqwgt bWrzrzRpcV4N8bg3RdKDjGaXoaBgSRx6M3A0+bo+6EY3iiZHWwWACSGHlge0k12OFJ76 QxEKg2sXCVCL/U9EHm0PwV6Ct68n4VvEx4eExjWkD6hq4sJd2e9OAWHvKDwHe3SYvqWU LlMhy2TxenKdWPW730KOTdn38g77SpBP1DGZp49mHkMvazQoRKqGeqk+uMPCbp+KLKIq sNmw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=ao7iOgtg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a12-20020aa7d74c000000b004af51516391si27083376eds.15.2023.03.27.08.10.48; Mon, 27 Mar 2023 08:11:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=ao7iOgtg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232702AbjC0PKV (ORCPT + 99 others); Mon, 27 Mar 2023 11:10:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56596 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229577AbjC0PKQ (ORCPT ); Mon, 27 Mar 2023 11:10:16 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C44B3C19 for ; Mon, 27 Mar 2023 08:10:12 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id E108B1FDE4; Mon, 27 Mar 2023 15:10:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1679929810; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=wumnRnZ3mJK+95JSA6PLaHw4mVwxf/aNZqTtEDsqwmM=; b=ao7iOgtgltan86e3vJELzQW8Z9M+QQ3IyzYxQtzhBXmTU5dJUQ7/BZw40GFk0hmH17YHyO ULCwjK41X0JZRzM4EBH2eBYq0ZcKxL4y4yql6iwP6BjF4mSHXRBv5A93lRqm27RiUbwWIR ZczhHUnlnZgzD4xrdlX0hBmFNoBMOcM= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id BFCD413329; Mon, 27 Mar 2023 15:10:10 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id Q//OKtKxIWS/LAAAMHmgww (envelope-from ); Mon, 27 Mar 2023 15:10:10 +0000 Date: Mon, 27 Mar 2023 17:10:10 +0200 From: Michal Hocko To: Vlastimil Babka Cc: Mike Rapoport , linux-mm@kvack.org, Andrew Morton , Dave Hansen , Peter Zijlstra , Rick Edgecombe , Song Liu , Thomas Gleixner , linux-kernel@vger.kernel.org, x86@kernel.org, Mel Gorman Subject: Re: [RFC PATCH 1/5] mm: intorduce __GFP_UNMAPPED and unmapped_alloc() Message-ID: References: <20230308094106.227365-1-rppt@kernel.org> <20230308094106.227365-2-rppt@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.5 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 27-03-23 16:31:45, Vlastimil Babka wrote: > On 3/27/23 15:43, Michal Hocko wrote: > > On Sat 25-03-23 09:38:12, Mike Rapoport wrote: > >> On Fri, Mar 24, 2023 at 09:37:31AM +0100, Michal Hocko wrote: > >> > On Wed 08-03-23 11:41:02, Mike Rapoport wrote: > >> > > From: "Mike Rapoport (IBM)" > >> > > > >> > > When set_memory or set_direct_map APIs used to change attribute or > >> > > permissions for chunks of several pages, the large PMD that maps these > >> > > pages in the direct map must be split. Fragmenting the direct map in such > >> > > manner causes TLB pressure and, eventually, performance degradation. > >> > > > >> > > To avoid excessive direct map fragmentation, add ability to allocate > >> > > "unmapped" pages with __GFP_UNMAPPED flag that will cause removal of the > >> > > allocated pages from the direct map and use a cache of the unmapped pages. > >> > > > >> > > This cache is replenished with higher order pages with preference for > >> > > PMD_SIZE pages when possible so that there will be fewer splits of large > >> > > pages in the direct map. > >> > > > >> > > The cache is implemented as a buddy allocator, so it can serve high order > >> > > allocations of unmapped pages. > >> > > >> > Why do we need a dedicated gfp flag for all this when a dedicated > >> > allocator is used anyway. What prevents users to call unmapped_pages_{alloc,free}? > >> > >> Using unmapped_pages_{alloc,free} adds complexity to the users which IMO > >> outweighs the cost of a dedicated gfp flag. > > > > Aren't those users rare and very special anyway? > > I think it's mostly about the freeing that can happen from a generic context > not aware of the special allocation, so it's not about how rare it is, but > how complex would be to determine exhaustively those contexts and do > something in them. Yes, I can see a challenge with put_page users but that is not really related to the gfp flag as those are only relevant for the allocation context. > >> For modules we'd have to make x86::module_{alloc,free}() take care of > >> mapping and unmapping the allocated pages in the modules virtual address > >> range. This also might become relevant for another architectures in future > >> and than we'll have several complex module_alloc()s. > > > > The module_alloc use is lacking any justification. More context would be > > more than useful. Also vmalloc support for the proposed __GFP_UNMAPPED > > likely needs more explanation as well. > > > >> And for secretmem while using unmapped_pages_alloc() is easy, the free path > >> becomes really complex because actual page freeing for fd-based memory is > >> deeply buried in the page cache code. > > > > Why is that a problem? You already hook into the page freeing path and > > special case unmapped memory. > > But the proposal of unmapped_pages_free() would suggest this would no longer > be the case? I can see a check in the freeing path. > But maybe we could, as a compromise, provide unmapped_pages_alloc() to get > rid of the new __GFP flag, provide unmapped_pages_free() to annotate places > that are known to free unmapped memory explicitly, but the generic page > freeing would also keep the hook? Honestly I do not see a different option if those pages are to be reference counted. Unless they can use a destructor concept like hugetlb pages. At least secret mem usecase cannot AFAICS. -- Michal Hocko SUSE Labs