Received: by 2002:a05:6358:c692:b0:131:369:b2a3 with SMTP id fe18csp134113rwb; Thu, 27 Jul 2023 10:21:22 -0700 (PDT) X-Google-Smtp-Source: APBJJlEvPh4bbtx/o2mdyQqZdvaV5UYbzGVv2LrJrOxKA4QUDzhcqPlW2g2Oe6KxcNoEKteBLAa+ X-Received: by 2002:a05:6a20:258f:b0:130:7ef2:ff21 with SMTP id k15-20020a056a20258f00b001307ef2ff21mr3592971pzd.19.1690478482073; Thu, 27 Jul 2023 10:21:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690478482; cv=none; d=google.com; s=arc-20160816; b=cHX/GnK/AiVcf26ixqB0ppgFYCyBoJcE0YxLNVAXqaUQsYvsdOnNL9h63Xy/2fOxRs IIB2pkUo7qVivihSw1YYt3pjTJr1jzGg3d6mU83iJc1+v4dlPTUl/89yZiniUJ06mq5c dAq9VXBxueubY9SlJOxE3YuQ+EGetmGMONqc53umVoyljqLvESPBzedUgFckiMYD7Ibb V/NyFPqc9I86+D0ab2e4NzrRGfdcnnw0Pm7xGdv9PzO1bzihCXjLW8jf3rK/Xinvm9mN XtNY2Nu7BZuGEEXfKBCCicUHo65o7G6517e17U7Ok5/u2Tcv3YPg9pa8vEgNLP+8eMRZ BBbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:dkim-signature:date; bh=zWu44KyYtJW37iFlqtSm+zqI+cvUj1jxNs2A6ZtISyk=; fh=ROPrPtd2FWh3j/8qCynb4TMN0jLoEyOYuFax84PhGz0=; b=ozMgRXGd56CkYMrjpoe1jVBOoz4bloPj60tstCZN1YVXPH8wjQp5oz19gBIoynTt2b goPh1fyuamdzJUQ6gPNw/xlnx94jrQYnF8R5VWwNlzhIhj9HrWuai4YN6AkFpseKAyh7 89KfrpSqeR8xQNRH2cn/tuVsQuGMN2MWdQHl0gvVSD8QnL58ZW0u+3SAqAUpmSDmOH/9 rAEv08mLZvJOONrOK/0FD479mVwUd/xA+OAFAvQqhNfM8WEWJ0dYDWCiIpurF90iki/W WaZZum8Xp21AtGVNgDP8vGNSanDh4WC2Dxw0wAlmq0JHgyh/JxXT/p4MWa1IScwnOMlJ G8fw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=QUrfO7Hy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p190-20020a6342c7000000b0055aeedd94c1si217964pga.289.2023.07.27.10.21.09; Thu, 27 Jul 2023 10:21:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=QUrfO7Hy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230322AbjG0RIU (ORCPT + 99 others); Thu, 27 Jul 2023 13:08:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41368 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230032AbjG0RIS (ORCPT ); Thu, 27 Jul 2023 13:08:18 -0400 Received: from out-96.mta0.migadu.com (out-96.mta0.migadu.com [91.218.175.96]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A3132726 for ; Thu, 27 Jul 2023 10:08:15 -0700 (PDT) Date: Thu, 27 Jul 2023 10:08:08 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1690477693; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=zWu44KyYtJW37iFlqtSm+zqI+cvUj1jxNs2A6ZtISyk=; b=QUrfO7HydlIsqe5C52Io5M8NiCgdefK5Og+FmNHSxcjzmhKbjbcjvwHgKkZNad/qrsCRos RjnGudjdymVCo0rNMtKFF8TJ2QOE6OlbErbzgDVajo6KCwfHOA/E4wvQe4aR/02Kz7SRT3 hIN3YnCkfALPyVvHGozqPpW4U+6nVVw= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Johannes Weiner Cc: Andrew Morton , Vlastimil Babka , Mel Gorman , Rik van Riel , Joonsoo Kim , Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: page_alloc: consume available CMA space first Message-ID: References: <20230726145304.1319046-1-hannes@cmpxchg.org> <20230727153413.GA1378510@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230727153413.GA1378510@cmpxchg.org> X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 27, 2023 at 11:34:13AM -0400, Johannes Weiner wrote: > On Wed, Jul 26, 2023 at 04:38:11PM -0700, Roman Gushchin wrote: > > On Wed, Jul 26, 2023 at 10:53:04AM -0400, Johannes Weiner wrote: > > > On a memcache setup with heavy anon usage and no swap, we routinely > > > see premature OOM kills with multiple gigabytes of free space left: > > > > > > Node 0 Normal free:4978632kB [...] free_cma:4893276kB > > > > > > This free space turns out to be CMA. We set CMA regions aside for > > > potential hugetlb users on all of our machines, figuring that even if > > > there aren't any, the memory is available to userspace allocations. > > > > > > When the OOMs trigger, it's from unmovable and reclaimable allocations > > > that aren't allowed to dip into CMA. The non-CMA regions meanwhile are > > > dominated by the anon pages. > > > > > > > > > Because we have more options for CMA pages, change the policy to > > > always fill up CMA first. This reduces the risk of premature OOMs. > > > > I suspect it might cause regressions on small(er) devices where > > a relatively small cma area (Mb's) is often reserved for a use by various > > device drivers, which can't handle allocation failures well (even interim > > allocation failures). A startup time can regress too: migrating pages out of > > cma will take time. > > The page allocator is currently happy to give away all CMA memory to > movables before entering reclaim. It will use CMA even before falling > back to a different migratetype. > > Do these small setups take special precautions to never fill memory? > Proactively trim file cache? Never swap? Because AFAICS, unless they > do so, this would only change the timing of when CMA fills up, not if. Imagine something like a web-camera or a router. It boots up, brings up some custom drivers/hardware, starts some daemons and runs forever. It might never reach the memory capacity or it might take hours or days. The point it that during the initialization cma is fully available. > > > And given the velocity of kernel upgrades on such devices, we won't learn about > > it for next couple of years. > > That's true. However, a potential regression with this would show up > fairly early in kernel validation since CMA would fill up in a more > predictable timeline. And the change is easy to revert, too. > > Given that we have a concrete problem with the current behavior, I > think it's fair to require a higher bar for proof that this will > indeed cause a regression elsewhere before raising the bar on the fix. I'm not opposing the change, just raising up a concern. I expect that we'll need a more complicated solution at some point anyway. > > > > Movable pages can be migrated out of CMA when necessary, but we don't > > > have a mechanism to migrate them *into* CMA to make room for unmovable > > > allocations. The only recourse we have for these pages is reclaim, > > > which due to a lack of swap is unavailable in our case. > > > > Idk, should we introduce such a mechanism? Or use some alternative heuristics, > > which will be a better compromise between those who need cma allocations always > > pass and those who use large cma areas for opportunistic huge page allocations. > > Of course, we can add a boot flag/sysctl/per-cma-area flag, but I doubt we want > > really this. > > Right, having migration into CMA could be a viable option as well. > > But I would like to learn more from CMA users and their expectations, > since there isn't currently a guarantee that CMA stays empty. This change makes cma allocations less deterministic. If previously a cma allocation was almost always succeeding, with this change we'll see more interim failures. (it's all about some time after a boot when the majority of memory is still empty). > > This patch would definitely be the simpler solution. It would also > shave some branches and cycles off the buddy hotpath for many users > that don't actively use CMA but have CONFIG_CMA=y (I checked archlinux > and Fedora, not sure about Suse). Yes, this is good.