Received: by 2002:a05:7412:e794:b0:fa:551:50a7 with SMTP id o20csp1776570rdd; Thu, 11 Jan 2024 08:58:42 -0800 (PST) X-Google-Smtp-Source: AGHT+IGBVNeFKNkiODeFgTvLpwPnJDHIbWz6s+4ezhPbb5JOu6pyHuyptQ7tV+equYOYYCPeZZ6P X-Received: by 2002:a05:6358:101:b0:175:6496:5e10 with SMTP id f1-20020a056358010100b0017564965e10mr180264rwa.40.1704992322094; Thu, 11 Jan 2024 08:58:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704992322; cv=none; d=google.com; s=arc-20160816; b=TQp3Bb3KuydaxQ5nNCQchxrGxldrhlTikT4GEvRh0LSDPKwgwbr8QY/FsXUB34zhn0 Bbf8JaR9Q4tQTsJfdxhuFDheQAVIsiYOjcNTxGuTzDRjvlZ5Z82uyu3ah8kuPk+5rOm8 lfrmS/ez2ZIrrN5lzTpXV6Wt0ZYoGfbMNU5AKIZVrbi+FXDH2Z46fJeCVu/Q0BoK8G5I k22eBRbKWeF5lqN/gthHWVlGHc+2F52U2fjBtPoMT6SbOacki1RK3BVSEiCWDi5qOiQQ UpvDagH4bAmm8ROjAN0csoa8nq7zIuCE1HnvMrDZy4u7tnDXiuh2IMXzxfShk33gHbkG BLKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:message-id:references:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date :dkim-signature; bh=coTF8H6/VApCNyDZSyKgzvq0UwcwbIR90wms3XVyTCI=; fh=qarxQUKp7pjnCQJR6MHRfIYujXlYDklcctggqfdO2sc=; b=Cn/xAthuMwVU1Eeo+puCDLsbjFSd7rtjaAPkjB2XjhN0mBEVaE0FkLR3gOXZw4XMX0 Jc+dFciOkUan/9FMSvpLyV8tlHvq33ct3ByrncivJUzvENQp+rWLtvg7LH1TrdmQpe+R Ijt73FThHmrNVlit2SCJp27BAQglbZ8+Wxn++coiQRO0vkHctly8emJyIcnysY6VI+uH /RiDSN2e+jjxzi6d9y+thNgSWO4/8q6VBvkA2Qm6LAOF4T2BsVuWfb2c2WmaMjwmTXOL 95IVUWLRMbDIb83cenStjdvuOPkv+bbuWq0Vantc7p7ArLO7ZS/m77i10sMLSwa5+g1m OhJg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="limOh/Qh"; spf=pass (google.com: domain of linux-kernel+bounces-23894-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-23894-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id j18-20020a056a00131200b006d9a4e6ce19si1408562pfu.159.2024.01.11.08.58.41 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Jan 2024 08:58:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-23894-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="limOh/Qh"; spf=pass (google.com: domain of linux-kernel+bounces-23894-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-23894-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id AC5C4B258B9 for ; Thu, 11 Jan 2024 16:50:49 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DF6805027F; Thu, 11 Jan 2024 16:50:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="limOh/Qh" Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5799A5026B for ; Thu, 11 Jan 2024 16:50:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--shakeelb.bounces.google.com Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-28bcf7f605aso5180960a91.0 for ; Thu, 11 Jan 2024 08:50:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704991838; x=1705596638; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=coTF8H6/VApCNyDZSyKgzvq0UwcwbIR90wms3XVyTCI=; b=limOh/Qh8NpninifhsGWUoGUvcL3QtrAoOHH6pix5r0nmAwHa1XyJ9///wSYSIsg1d JpFm3zBUfyrzqsPK+eGnQFQoO6TPxM4GTPu9hhLhQAxfcVsnPMTwAfATHdq5T2ixpPKf uqGHDwyuMHqJ2lw7KQ1+Et5biFm01Nj4tnIY5qPAgpjk60lQnfhcHCPIYEuRcGb5/iCW Q+apOF7AgMaLeI3pcnShlqQ9j0fFgE3SmbTDzmLipCDyaStB6c4PqSEuBU8m7lIbcs1T HMH2t8i5NrRXxMfFNZ7eVe8XZj4NDS2fBzw3wSfuFwzy6CNwauFQi+MbT75gMGuLYLic HgBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704991838; x=1705596638; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=coTF8H6/VApCNyDZSyKgzvq0UwcwbIR90wms3XVyTCI=; b=EWiAyYrYhSN7CMd/ZMorR548yEzeAx4iAtPnuyNkIncmFtf+ztfiHNbxSANnMtwxAW pWNJgOu/UQgDYj7Vcx7ll7f02VSHBuJUfxEJrmk3dKs3AUxjXHNvfyUkOcxQ2sxWoNWw w3vdJPc5pgBhRFyR2y+8MdVK3f586IuxvQJNXrhbo1a0E8F8C6BZVW+iZZZqdrKH463v U7zUKgZj5oOoBzO0R3pIP9CE+5bPEKfqHhZ1Hd2wc4oRcUkXwvvAQij/vXubh+tntchf y7eLUjSDuBk+LjV8krBc7nEm07Cn3qKYSaRHbPmPUDRbocnrTjxmThW81bQ6yo+YsUh2 ZnuQ== X-Gm-Message-State: AOJu0Yys/ie3Poait2ZnGZgsM3kKSxmuqosIH19kKD3Zy559+YWpZQkN eDUlLMrl/Rxl4TVfUhU8iDOjkLxcQWK0IRz88JI4 X-Received: from shakeelb.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:262e]) (user=shakeelb job=sendgmr) by 2002:a17:90b:3d90:b0:28d:ba07:8c2 with SMTP id pq16-20020a17090b3d9000b0028dba0708c2mr3992pjb.1.1704991838589; Thu, 11 Jan 2024 08:50:38 -0800 (PST) Date: Thu, 11 Jan 2024 16:50:36 +0000 In-Reply-To: <20240111132902.389862-1-hannes@cmpxchg.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111132902.389862-1-hannes@cmpxchg.org> Message-ID: <20240111165036.w2qbetwrxb2mcur4@google.com> Subject: Re: [PATCH] mm: memcontrol: don't throttle dying tasks on memory.high From: Shakeel Butt To: Johannes Weiner Cc: Andrew Morton , Michal Hocko , Roman Gushchin , Muchun Song , Tejun Heo , Dan Schatzberg , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="us-ascii" On Thu, Jan 11, 2024 at 08:29:02AM -0500, Johannes Weiner wrote: > While investigating hosts with high cgroup memory pressures, Tejun > found culprit zombie tasks that had were holding on to a lot of > memory, had SIGKILL pending, but were stuck in memory.high reclaim. > > In the past, we used to always force-charge allocations from tasks > that were exiting in order to accelerate them dying and freeing up > their rss. This changed for memory.max in a4ebf1b6ca1e ("memcg: > prohibit unconditional exceeding the limit of dying tasks"); it noted > that this can cause (userspace inducable) containment failures, so it > added a mandatory reclaim and OOM kill cycle before forcing charges. > At the time, memory.high enforcement was handled in the userspace > return path, which isn't reached by dying tasks, and so memory.high > was still never enforced by dying tasks. > > When c9afe31ec443 ("memcg: synchronously enforce memory.high for large > overcharges") added synchronous reclaim for memory.high, it added > unconditional memory.high enforcement for dying tasks as well. The > callstack shows that this path is where the zombie is stuck in. > > We need to accelerate dying tasks getting past memory.high, but we > cannot do it quite the same way as we do for memory.max: memory.max is > enforced strictly, and tasks aren't allowed to move past it without > FIRST reclaiming and OOM killing if necessary. This ensures very small > levels of excess. With memory.high, though, enforcement happens lazily > after the charge, and OOM killing is never triggered. A lot of > concurrent threads could have pushed, or could actively be pushing, > the cgroup into excess. The dying task will enter reclaim on every > allocation attempt, with little hope of restoring balance. > > To fix this, skip synchronous memory.high enforcement on dying tasks > altogether again. Update memory.high path documentation while at it. > > Fixes: c9afe31ec443 ("memcg: synchronously enforce memory.high for large overcharges") > Reported-by: Tejun Heo > Signed-off-by: Johannes Weiner Acked-by: Shakeel Butt