Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp3791297ioo; Wed, 25 May 2022 08:12:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyU0JasgtG+CDVK6R/ZBf3JkzKqBetSDD0a4TID0PkbY8ykOyM4gZ4uSeK4NEOaZipm/a6S X-Received: by 2002:aa7:c68d:0:b0:42b:cc77:4696 with SMTP id n13-20020aa7c68d000000b0042bcc774696mr2753523edq.320.1653491549812; Wed, 25 May 2022 08:12:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653491549; cv=none; d=google.com; s=arc-20160816; b=JERdONOme/kNO+aDN/t7u8qK9S8FXAMyHz1UxWG/Y+gFT4hYj/MVl6xJs3Mmo6YBH+ Lm/s7rOQAR3Jwt1q4S4ScUrdZVIRzNnjvH8mE/neKapWhaHrJNUl2GZalVrBMjj/xGA+ 8O1KslbaGBMuRojnkZjWOIAVlLieUa5AEw26aIs1L28PXOxpv91/C5Igo28Q3N7487FI jsOPo0gUqNVitxrEDsKvr2Ygcnt8EY2v/gLd8re4dBGrdYQjYUTu3Sy2lx3/e9xAZL93 nswVaRExXLIB9qAqDFglR41kXgtORw7Bgz0GfEa2kwUbFrsbaSUOe09eqrynzq/SXleV xNZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature:dkim-signature; bh=axk63yz+qUhcRJi6Yd8PYozsr5heAxMREUYpIa7UBSQ=; b=VCKgvnVFrjHeB3XAoOe/eixEYmhxM4Rf+Lr4jeYhwuQX0b7hnWGpV316Ibo8PEUBkT f1L6JREOYoZBtmzM+Ya8+8YAC/Bq1bM5yeAvqsOIZtaAWeritaS3D/a7sfCynsiFzVwG i03Kg/qs8h9drBUv7GxUPP5ukRyh4WPyCxdnQZ8uC+LEfEVPVxVftitAqf51u9DggwvJ /K0te2XgJ9Bt8UumymjtaJ+fMHVyxQHUCxAqq3UQTGiSWbbdNvof0UPL5lg9kzLFs5Wg oL54zpm46qpPafoBq9tcz0R8YwJuIX+SdAPkCQ3R3na4UchChWTkN6F/9LnI4S2suBbJ HGSw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b="n6m/hQ39"; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dd20-20020a1709069b9400b006e7afc3aeebsi7619793ejc.521.2022.05.25.08.12.01; Wed, 25 May 2022 08:12:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b="n6m/hQ39"; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239880AbiEXUig (ORCPT + 99 others); Tue, 24 May 2022 16:38:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60028 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229503AbiEXUie (ORCPT ); Tue, 24 May 2022 16:38:34 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BAC1D6C571 for ; Tue, 24 May 2022 13:38:33 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id B32D01F8C4; Tue, 24 May 2022 20:38:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1653424710; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=axk63yz+qUhcRJi6Yd8PYozsr5heAxMREUYpIa7UBSQ=; b=n6m/hQ39agXJkyBDMPqaqYDPbNKtIS0FZ9xr4EidYMoQ2+aaMaNCEiFtHbwBApiDu1H+jm t5+JybvXb9HhvmMZKYWW422r0/TRrFvSdA2CemTGgh0no/fKLsfrsH/Oq68CxltyNkOiUN YAyJzMCKSF/HEMG0IIH6M5ogtPWCC+0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1653424710; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=axk63yz+qUhcRJi6Yd8PYozsr5heAxMREUYpIa7UBSQ=; b=rHiJgglZIp18X/I/sIEU8OeG8CgIOg8Ex/cgDlIpn4x0xrtnlZpxFuuw7DQyQcm95FlJfz 6UnXdIXVer+A5eDQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 7E1E913ADF; Tue, 24 May 2022 20:38:30 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id qpQCHkZCjWIENAAAMHmgww (envelope-from ); Tue, 24 May 2022 20:38:30 +0000 Message-ID: <1a0a859b-1f25-5136-bb86-9efe68aabbb8@suse.cz> Date: Tue, 24 May 2022 22:37:15 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: Memory allocation on speculative fastpaths Content-Language: en-US To: Johannes Weiner , Suren Baghdasaryan Cc: Matthew Wilcox , "Paul E. McKenney" , Michal Hocko , "Liam R. Howlett" , Michel Lespinasse , linux-mm , LKML , David Hildenbrand , Davidlohr Bueso References: <20220503155913.GA1187610@paulmck-ThinkPad-P17-Gen-1> <20220503163905.GM1790663@paulmck-ThinkPad-P17-Gen-1> From: Vlastimil Babka In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-7.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/4/22 18:23, Johannes Weiner wrote: > On Tue, May 03, 2022 at 04:15:46PM -0700, Suren Baghdasaryan wrote: >> On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox wrote: >>> >>> On Tue, May 03, 2022 at 09:39:05AM -0700, Paul E. McKenney wrote: >>>> On Tue, May 03, 2022 at 06:04:13PM +0200, Michal Hocko wrote: >>>>> On Tue 03-05-22 08:59:13, Paul E. McKenney wrote: >>>>>> Hello! >>>>>> >>>>>> Just following up from off-list discussions yesterday. >>>>>> >>>>>> The requirements to allocate on an RCU-protected speculative fastpath >>>>>> seem to be as follows: >>>>>> >>>>>> 1. Never sleep. >>>>>> 2. Never reclaim. >>>>>> 3. Leave emergency pools alone. >>>>>> >>>>>> Any others? >>>>>> >>>>>> If those rules suffice, and if my understanding of the GFP flags is >>>>>> correct (ha!!!), then the following GFP flags should cover this: >>>>>> >>>>>> __GFP_NOMEMALLOC | __GFP_NOWARN >>>>> >>>>> GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN >>>> >>>> Ah, good point on GFP_NOWAIT, thank you! >>> >>> Johannes (I think it was?) made the point to me that if we have another >>> task very slowly freeing memory, a task in this path can take advantage >>> of that other task's hard work and never go into reclaim. So the >>> approach we should take is: > > Right, GFP_NOWAIT can starve out other allocations. It can clear out > the freelists without the burden of having to do reclaim like > everybody else wanting memory during a shortage. Including GFP_KERNEL. FTR, I wonder if this is really true, given the suggested fallback. With GFP_NOWAIT, you can either see memory (in all applicable zones) as a) above low_watermark, just go ahead and allocate, as GFP_KERNEL would b) between min and low watermark, wake up kswapd and allocate, as GFP_KERNEL would c) below min watermark, the most interesting. GFP_KERNEL fallbacks to reclaim. If the GFP_NOWAIT path's fallback also includes reclaim, as suggested in this thread, how is it really different from GFP_KERNEL? So am I missing something or is GFP_NOWAIT fastpath with an immediate fallback that includes reclaim (and not just a retry loop) fundamentally not different from GFP_KERNEL, regardless of how often we attempt it? > In smaller doses and/or for privileged purposes (e.g. single-argument > kfree_rcu ;)), those allocations are fine. But because the context is > page tables specifically, it would mean that userspace could trigger a > large number of those and DOS other applications and the kernel. > >>> p4d_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); >>> pud_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); >>> pmd_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); >>> >>> if (failure) { >>> rcu_read_unlock(); >>> do_reclaim(); >>> return FAULT_FLAG_RETRY; >>> } >>> >>> ... but all this is now moot since the approach we agreed to yesterday >>> is: >> >> I think the discussion was about the above approach and Johannes >> suggested to fallback to the normal pagefault handling with mmap_lock >> locked if PMD does not exist. Please correct me if I misunderstood >> here. > > Yeah. Either way works, as long as the task is held accountable. >