Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp413182pxb; Wed, 13 Jan 2021 06:51:17 -0800 (PST) X-Google-Smtp-Source: ABdhPJxVWsSS4PhA4vsLXauB/cy9XCddZ2MS4lhUb1gNa6gV7mlzAiGiG+OMmxcVPOJ3u0AWaCXs X-Received: by 2002:a17:906:5857:: with SMTP id h23mr1719646ejs.465.1610549477442; Wed, 13 Jan 2021 06:51:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610549477; cv=none; d=google.com; s=arc-20160816; b=KzqX+3G8vjtjILrC/qwdWSI7K1fzxUtKAmAODQhtivSOqZgQ04eYFIaRK6whpw29b2 wmQQyB2SDyTJcLFn7MH0GeTP5RhkC1X98YPOr1qIknYRtjBDGr4AqbK2FnoAVBYn3iRg YYFxUr9bH0vPhYut10f3itQPROBQDqhQfFooYDLphqBoEqyddovBlyzDwKpsUrW09AMC tzyX0lRn3IaqRXugk7GgzpSxgVNwuv1L6/wmnYCa9aDkclfSqsxmTZY9YB1OpBdQvNUy TPjKWlIbOwle2cIo2PMHc56BGV4mVOIImy7RZw3uvCWOepX+Vd1JiiNooA3j2TUawE3b xGxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=nt0P3UB52tUXihhKnY0EGJSOwPIXl2w0VfMCeDXigfg=; b=tOL0Z8AYEjETSFEr5Eb9DpynltHifFD4bnwcUq3KP1OBrMuxEQPg2iCMwibexZ70yd Yd2XnkvAuyxOCzyDkumBFEy6hc03XuLnkz3qepS57PeadcYjjzVDbVBHvwrFoU3FzCI+ rh6uT1quBGcknKAkxyrlUnoJNQX44sO01gzwOJzDUAtfH+EB9jNSUg3uJ0fZVN21HT2h m2QmyCAU88SoWQJUdvyjwd5H9T037w7JyGz2P0ydr1oYYGhhLSDOXkOCL3Ok7Au2hnx8 MmDdZgyhBQHHGX6tRUWmjBBr3/dUXsx/9eLe1s98JI8Hi1Qtrt+SLRthBK7aqgFYCfmw hgNg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=MaYTpfeB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q4si1051555eji.465.2021.01.13.06.50.53; Wed, 13 Jan 2021 06:51:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=MaYTpfeB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727180AbhAMOrn (ORCPT + 99 others); Wed, 13 Jan 2021 09:47:43 -0500 Received: from mx2.suse.de ([195.135.220.15]:58200 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727102AbhAMOrm (ORCPT ); Wed, 13 Jan 2021 09:47:42 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1610549215; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=nt0P3UB52tUXihhKnY0EGJSOwPIXl2w0VfMCeDXigfg=; b=MaYTpfeBCSb+4GfytJrC3yMQ85mJn3wsQrdPMm3X8mJpdVwihUs+pt5UikXfQ4Js75AVXa p+88RPnduouIF3z9abzeMfvM3SCYKBhdqRIhhKakryDlNAwM89t/a65OMRWqoBMjPH7Fg8 YApkvhaPiozFN1PYsTU6iJ2/ww75CGU= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 62A00AB92; Wed, 13 Jan 2021 14:46:55 +0000 (UTC) Date: Wed, 13 Jan 2021 15:46:54 +0100 From: Michal Hocko To: Johannes Weiner Cc: Andrew Morton , Tejun Heo , Roman Gushchin , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH] mm: memcontrol: prevent starvation when writing memory.high Message-ID: <20210113144654.GD22493@dhcp22.suse.cz> References: <20210112163011.127833-1-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210112163011.127833-1-hannes@cmpxchg.org> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 12-01-21 11:30:11, Johannes Weiner wrote: > When a value is written to a cgroup's memory.high control file, the > write() context first tries to reclaim the cgroup to size before > putting the limit in place for the workload. Concurrent charges from > the workload can keep such a write() looping in reclaim indefinitely. > > In the past, a write to memory.high would first put the limit in place > for the workload, then do targeted reclaim until the new limit has > been met - similar to how we do it for memory.max. This wasn't prone > to the described starvation issue. However, this sequence could cause > excessive latencies in the workload, when allocating threads could be > put into long penalty sleeps on the sudden memory.high overage created > by the write(), before that had a chance to work it off. > > Now that memory_high_write() performs reclaim before enforcing the new > limit, reflect that the cgroup may well fail to converge due to > concurrent workload activity. Bail out of the loop after a few tries. I can see that you have provided some more details in follow up replies but I do not see any explicit argument why an excessive time for writer is an actual problem. Could you be more specific? If the writer is time sensitive then there is a trivial way to workaround that and kill it by a signal (timeout 30s echo ....). Btw. this behavior has been considered http://lkml.kernel.org/r/20200710122917.GB3022@dhcp22.suse.cz/ " With this change the reclaim here might be just playing never ending catch up. On the plus side a break out from the reclaim loop would just enforce the limit so if the operation takes too long then the reclaim burden will move over to consumers eventually. So I do not see any real danger. " > Fixes: 536d3bf261a2 ("mm: memcontrol: avoid workload stalls when lowering memory.high") > Cc: # 5.8+ Why is this worth backporting to stable? The behavior is different but I do not think any of them is harmful. > Reported-by: Tejun Heo > Signed-off-by: Johannes Weiner I am not against the patch. The existing interface doesn't provide any meaningful feedback to the userspace anyway. User would have to re check to see the result of the operation. So how hard we try is really an implementation detail. > --- > mm/memcontrol.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 605f671203ef..63a8d47c1cd3 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6275,7 +6275,6 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, > > for (;;) { > unsigned long nr_pages = page_counter_read(&memcg->memory); > - unsigned long reclaimed; > > if (nr_pages <= high) > break; > @@ -6289,10 +6288,10 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, > continue; > } > > - reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high, > - GFP_KERNEL, true); > + try_to_free_mem_cgroup_pages(memcg, nr_pages - high, > + GFP_KERNEL, true); > > - if (!reclaimed && !nr_retries--) > + if (!nr_retries--) > break; > } > > -- > 2.30.0 > -- Michal Hocko SUSE Labs