Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp501562ybt; Fri, 10 Jul 2020 05:32:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyrcxpeApaMnck5gnnJpXPs6ZOF4awE0f+iYFRHhKI56Ze03cGLFTqTWMli6Zdg21nHn/1/ X-Received: by 2002:a17:906:1403:: with SMTP id p3mr51937750ejc.106.1594384358123; Fri, 10 Jul 2020 05:32:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594384358; cv=none; d=google.com; s=arc-20160816; b=l3O7fD7qeDQs/ScoE8eSX9NZMIA5TVyx3Q2Yw786OvhVodcCd/s9J4Bpf/HO6XvH59 FbMAwqr6HKBTOHHViBbJ7oLBwfwT6yxPkQB61rGGIU51XlpbA+WoyUbnWl5GizH1QEgL O3Nni+018Z7ucoDrTZIEPi1XBPH7RwBPkYS5Y9sQf1i2NICuoooZNpaMLBX5vOuG3DnQ yTyyMye1UDbP8mF4onxT1w3/RAr8k+t9MFQWRH+dGuaAVGSYC8e/o5lTsCANpgw34bSv r9xpGU9Onn+0t0ESfsXoYKOIrGE4fzJsGiQWQP/60tArNhx0f8oXJKOIMBq7i3fG/l2o 8yjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=Cgm+eurmiukbbtx74MhYfKemTBTq6ztp8WelN9Ojqdk=; b=DC1+hbBiUZTPdb107222U+JrQzn31/b03iZs7W//ZJV/1CEI/ZUfy4GrBuTzr5Iwx3 9rY5Q11kvC23n4VA5VJGGBLalXfsaQbSscAlM0kUgs9lu9MgCUSoNbgGetjt27sZxX1r jsxIkpGhetH/+yIGNeTTbI+zBs5HvrXs0rRCeMtzVr+M6FNgA79uKDPGjQbRdGGmq0PU JfWSQfQym7pTt8R4e10MLnGXELMKZAVGnD7bmHI9aCNdYo/Jlqi3+G+TJ9JghmE3TBMM cvpD3W0OfDN4Ejxctrq57upuoUXdq6nEIY1qK4W/prxocHgyNIXcvHxgNhTUsJ3KyQKL 6NqA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dk7si3822640edb.86.2020.07.10.05.32.15; Fri, 10 Jul 2020 05:32:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726832AbgGJM3W (ORCPT + 99 others); Fri, 10 Jul 2020 08:29:22 -0400 Received: from mail-ed1-f68.google.com ([209.85.208.68]:36064 "EHLO mail-ed1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726664AbgGJM3V (ORCPT ); Fri, 10 Jul 2020 08:29:21 -0400 Received: by mail-ed1-f68.google.com with SMTP id dg28so4519903edb.3 for ; Fri, 10 Jul 2020 05:29:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Cgm+eurmiukbbtx74MhYfKemTBTq6ztp8WelN9Ojqdk=; b=rQ9SBFtUZuApZA6d/VeCfiCWT+QUZUndHFV91AAijK99OXK4wV4eqvDy/clwiF4Wj3 f02tTpNoRQGg7hdxQbZvpkhbnQPf94yjqdVlyptWQcTOIoj9r5NRrA0G9BiJ/3RgSySh MA+cFg8j60N/2sc1NggPSyq+No0eTMVZPqER+Y/bLgan062R47RLmWhTfiC+IBCuffGP cOFJ2pdHm3fLKfEXrYnNQwbrHG/irLukMt1q8QrzHWwl6Aq36hw1sBbbWqkY0O1f6InF 9cMAFj2ORGM5jhRHPADQSVfJ6I+bg9oBLofpBfpf8X++H2sCKWi5UtVuzFpeH7MQRoHA DxCg== X-Gm-Message-State: AOAM533Ixo2xql0N3/YOCohRnnq8IzqinKI/utFb9GMI/kG/TKF1hPt4 PnZP8Qul/VCaCmVvy9LoOkA= X-Received: by 2002:aa7:c883:: with SMTP id p3mr79304897eds.128.1594384159754; Fri, 10 Jul 2020 05:29:19 -0700 (PDT) Received: from localhost (ip-37-188-148-171.eurotel.cz. [37.188.148.171]) by smtp.gmail.com with ESMTPSA id w24sm4199587edt.28.2020.07.10.05.29.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Jul 2020 05:29:19 -0700 (PDT) Date: Fri, 10 Jul 2020 14:29:17 +0200 From: Michal Hocko To: Roman Gushchin Cc: Andrew Morton , Johannes Weiner , Shakeel Butt , linux-mm@kvack.org, kernel-team@fb.com, linux-kernel@vger.kernel.org, Domas Mituzas , Tejun Heo , Chris Down Subject: Re: [PATCH] mm: memcontrol: avoid workload stalls when lowering memory.high Message-ID: <20200710122917.GB3022@dhcp22.suse.cz> References: <20200709194718.189231-1-guro@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200709194718.189231-1-guro@fb.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 09-07-20 12:47:18, Roman Gushchin wrote: > Memory.high limit is implemented in a way such that the kernel > penalizes all threads which are allocating a memory over the limit. > Forcing all threads into the synchronous reclaim and adding some > artificial delays allows to slow down the memory consumption and > potentially give some time for userspace oom handlers/resource control > agents to react. > > It works nicely if the memory usage is hitting the limit from below, > however it works sub-optimal if a user adjusts memory.high to a value > way below the current memory usage. It basically forces all workload > threads (doing any memory allocations) into the synchronous reclaim > and sleep. This makes the workload completely unresponsive for > a long period of time and can also lead to a system-wide contention on > lru locks. It can happen even if the workload is not actually tight on > memory and has, for example, a ton of cold pagecache. > > In the current implementation writing to memory.high causes an atomic > update of page counter's high value followed by an attempt to reclaim > enough memory to fit into the new limit. To fix the problem described > above, all we need is to change the order of execution: try to push > the memory usage under the limit first, and only then set the new > high limit. Shakeel would this help with your pro-active reclaim usecase? It would require to reset the high limit right after the reclaim returns which is quite ugly but it would at least not require a completely new interface. You would simply do high = current - to_reclaim echo $high > memory.high echo infinity > memory.high # To prevent direct reclaim # allocation stalls The primary reason to set the high limit in advance was to catch potential runaways more effectively because they would just get throttled while memory_high_write is reclaiming. With this change the reclaim here might be just playing never ending catch up. On the plus side a break out from the reclaim loop would just enforce the limit so if the operation takes too long then the reclaim burden will move over to consumers eventually. So I do not see any real danger. > Signed-off-by: Roman Gushchin > Reported-by: Domas Mituzas > Cc: Johannes Weiner > Cc: Michal Hocko > Cc: Tejun Heo > Cc: Shakeel Butt > Cc: Chris Down Acked-by: Michal Hocko > --- > mm/memcontrol.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index b8424aa56e14..4b71feee7c42 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6203,8 +6203,6 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, > if (err) > return err; > > - page_counter_set_high(&memcg->memory, high); > - > for (;;) { > unsigned long nr_pages = page_counter_read(&memcg->memory); > unsigned long reclaimed; > @@ -6228,6 +6226,8 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, > break; > } > > + page_counter_set_high(&memcg->memory, high); > + > return nbytes; > } > > -- > 2.26.2 -- Michal Hocko SUSE Labs