Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp844563ybk; Wed, 20 May 2020 13:30:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJycQ/CcEqKsHZlz1y7Jb/2GT0AriAYdv5+g8ruRXxgR5UQaAxEDGB7ayJmWVtyhJiVyJHtf X-Received: by 2002:a17:906:780b:: with SMTP id u11mr718395ejm.341.1590006655531; Wed, 20 May 2020 13:30:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590006655; cv=none; d=google.com; s=arc-20160816; b=XIIV+Z60dKzmiONrQDLp1cMC5SlNo0n9/tfJetpDe5zYbWlv3AwldRfY7Lrdobl4Ok BPr3E4KQ4oA7wiBJvCP+83CrH9aj3AdmZcx3pBANeNXVuTtp8WLXPiRpfBV3dS1NJc15 I4k7GXJ0vQG2DBYxLXKeLQgyAUoqugNiWYtbubEWNW0V4qPtYgTkV9uvOH5/Nn8UmaEF xJvwGzjYSLvnWvEGs4GBtvEJZDK9PiFCGWV8+0O+39+dHDZUbX2K9rbmq7yT+uo51Nf7 VuWhAzF3C8oSXllr/dHz5HcbB02oeDMejrICQSwHjdz39YUry1Zd+gLvD6H5Prk5pc8q keKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=pyAjRlnWUUT1mywco3FJeiLGJZLmkRz0VQZk3yUWI70=; b=dEMHHktMRNFtx6kP3BgLHQZasaD57OOe6umgs/MFttTw4HioBF6jQ14pLvrhV75+Q7 1DzdC4Xj96xSoiiZgv4aVGKAQklHu/qlueI1wM4Gnwx9H+DzicJaoOtT7wTfY4Tp0zGe mU8Wvz/7jJcu8kvzna/m61heTTt9cxTDFd/E3f0KwAtlaxZUMRJ3PZ5LXbc0yvysY2o2 x4YgFwl0uoCGcm3JGgEkvYH0le4WyS5s93ZU5bTBfvaKqa1RTsl6M0sfmgFbO2VIFJvw G2fA2GJs8Fs6yh4WOdM+0YPqFBtwRuCTuGeGuYNVmiCzxHFRgKYHHYUcItrge7A5Q3Su s8Lw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chrisdown.name header.s=google header.b=PGibgYQ1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chrisdown.name Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id lu16si2313244ejb.600.2020.05.20.13.30.32; Wed, 20 May 2020 13:30:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@chrisdown.name header.s=google header.b=PGibgYQ1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chrisdown.name Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727860AbgETU0y (ORCPT + 99 others); Wed, 20 May 2020 16:26:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726827AbgETU0x (ORCPT ); Wed, 20 May 2020 16:26:53 -0400 Received: from mail-ej1-x643.google.com (mail-ej1-x643.google.com [IPv6:2a00:1450:4864:20::643]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 33DAEC061A0F for ; Wed, 20 May 2020 13:26:52 -0700 (PDT) Received: by mail-ej1-x643.google.com with SMTP id x1so5802731ejd.8 for ; Wed, 20 May 2020 13:26:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chrisdown.name; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=pyAjRlnWUUT1mywco3FJeiLGJZLmkRz0VQZk3yUWI70=; b=PGibgYQ1AtG2afha5buPGXZKWooA7vXxvQmNmO1Idq+C1zDslsESBmlsaVApchVa1L glQJr13dgRd6TvmwuLCmKqJbXvyxIQU3mUqKj5WsYxSaRPgBreBPMdk+Dj13fIlUFGXk IeF+KqfHBc9dvaXAutyupqPzuLkHNVKW3WeOo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=pyAjRlnWUUT1mywco3FJeiLGJZLmkRz0VQZk3yUWI70=; b=stwn7/hAC6AspRAMW+MCoDG14GoKvQ5dKgacVH3O4KA1DC6RLibAmTyBg3GQXwXoep 4TqY9ogGEwVr6i9KEXPhVHyrZDDF4Pe/7brClwvNRZ9SMtGFHXfZKRhUctmtupv65/Qx vpBIWTqfkNvVQHvsRA+Bl06p2jZ1gTXG6ZP5Ks3e/s6BvkH+dbYPUYzLNOov9/xZKoR5 IGMzijdJ95nu6qoGjNKVLImbg2Fe92q+ziDQZV32wXzaqFiAOlitu9hInUNQjFfWUT3h AIIwXMDt1gi+Cv7mzw3+otKSWqqVFiLcpcTiaAn5Nyy8kgIWbCr5mJFU+AxxexD5qZfT fQag== X-Gm-Message-State: AOAM532SF41Xdrrwyb/zt0qQDPg0bMn4WEHU8r4FAyuggIYwmuZYhGup xVXMKK+YTp9Ce7Oyn6VBkSKGCXPhgCSPoHcg X-Received: by 2002:a17:906:7a1c:: with SMTP id d28mr756450ejo.10.1590006410840; Wed, 20 May 2020 13:26:50 -0700 (PDT) Received: from localhost ([2620:10d:c093:400::5:758d]) by smtp.gmail.com with ESMTPSA id dt12sm2822454ejb.102.2020.05.20.13.26.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2020 13:26:50 -0700 (PDT) Date: Wed, 20 May 2020 21:26:50 +0100 From: Chris Down To: Michal Hocko Cc: Andrew Morton , Johannes Weiner , Tejun Heo , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH] mm, memcg: reclaim more aggressively before high allocator throttling Message-ID: <20200520202650.GB558281@chrisdown.name> References: <20200520143712.GA749486@chrisdown.name> <20200520160756.GE6462@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20200520160756.GE6462@dhcp22.suse.cz> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Michal Hocko writes: >Let me try to understand the actual problem. The high memory reclaim has >a target which is proportional to the amount of charged memory. For most >requests that would be SWAP_CLUSTER_MAX though (resp. N times that where >N is the number of memcgs in excess up the hierarchy). I can see to be >insufficient if the memcg is already in a large excess but if the >reclaim can make a forward progress this should just work fine because >each charging context should reclaim at least the contributed amount. > >Do you have any insight on why this doesn't work in your situation? >Especially with such a large inactive file list I would be really >surprised if the reclaim was not able to make a forward progress. Reclaim can fail for any number of reasons, which is why we have retries sprinkled all over for it already. It doesn't seem hard to believe that it might just fail for transient reasons and drive us deeper into the hole as a result. In this case, a.) the application is producing tons of dirty pages, and b.) we have really heavy systemwide I/O contention on the affected machines. This high load is one of the reasons that direct and kswapd reclaim cannot keep up, and thus nr_pages can become a number of orders of magnitude larger than SWAP_CLUSTER_MAX. This is trivially reproducible on these machines, it's not an edge case. Putting a trace_printk("%d\n", __LINE__) at non-successful reclaim in shrink_page_list shows that what's happening is always (and I really mean always) the "dirty page and you're not kswapd" check, as expected: if (PageDirty(page)) { /* * Only kswapd can writeback filesystem pages * to avoid risk of stack overflow. But avoid * injecting inefficient single-page IO into * flusher writeback as much as possible: only * write pages when we've encountered many * dirty pages, and when we've already scanned * the rest of the LRU for clean pages and see * the same dirty pages again (PageReclaim). */ if (page_is_file_lru(page) && (!current_is_kswapd() || !PageReclaim(page) || !test_bit(PGDAT_DIRTY, &pgdat->flags))) { /* * Immediately reclaim when written back. * Similar in principal to deactivate_page() * except we already have the page isolated * and know it's dirty */ inc_node_page_state(page, NR_VMSCAN_IMMEDIATE); SetPageReclaim(page); goto activate_locked; } >Now to your patch. I do not like it much to be honest. >MEM_CGROUP_RECLAIM_RETRIES is quite arbitrary and I neither like it in >memory_high_write because the that is an interruptible context so there >shouldn't be a good reason to give up after $FOO number of failed >attempts. try_charge and memory_max_write are slightly different because >we are invoking OOM killer based on the number of failed attempts. As Johannes mentioned, the very intent of memory.high is to have it managed using a userspace OOM killer, which monitors PSI. As such, I'm not sure this distinction means much.