Received: by 2002:a05:7412:5112:b0:fa:6e18:a558 with SMTP id fm18csp1272177rdb; Wed, 24 Jan 2024 09:46:52 -0800 (PST) X-Google-Smtp-Source: AGHT+IHDUf08VwkEek+J9CK+FFx/gLk2u53Toou4yDJMnh90xfr9ieL4LimvRbg4FRjJ6IXK7J7t X-Received: by 2002:a05:6a20:4303:b0:19a:37be:1af9 with SMTP id h3-20020a056a20430300b0019a37be1af9mr4485pzk.43.1706118412606; Wed, 24 Jan 2024 09:46:52 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706118412; cv=pass; d=google.com; s=arc-20160816; b=tQeGQrtxmu2LT5NtjY81cr7B4XcGxxFXBoDAIFsiyVmpj6uM49fOu7Y6/LEzY9IZ4F 1ocdP0l+bX8JAZzYd/urPEOHtlvuxfd2AKeVya7k+h15laA9D3q6rvxUcvTrQb3tIbHW peI4LX3mgX/401jhqKiTUcga6UIB8qD/R2f1nNyTgMyqYCktYkNZFZzlcst7AOJvyBum oMrUEKhRt53ZyxNyM8GDK+6WWfYEoyXv0IGq2jB/hakjRXDDxgutm2CSGRB6Kxb3qZQN vQOA4UCPVNPjPtjEUrb/Bj6KFzcxA6z1gGei4AsbTRJXEcSjofjRFusHVXWyn0eNxt7b zf4g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=ZcrjU1rXiL0wUCSwbKfLMXGroLa5TBhUecGAmJobD/I=; fh=DMxgLPpaEEuPOy6jzB5vN7dKcqWpIan2/noPBPijtqI=; b=ke5Y3ocwQV+zVkweKY8xvRGSHvWiiRoYUtRNJJ1Sz6JVKBKHbHzrACYp2SVn5hLsUl Z1JZnSIJdWYkROiZ5Xnk0zLPdJxLRAAEeDU3JL9jCQONTWNMEWQilhcN8o8lpqmcfTHb vM+fsomD5KhBGY9xIzW+BkyYgxhEMjMPF/ngX/TYOlGPPfWAo15yYmKbA/9OqpL7RepU svrEN4E0g2eodGUMpJ1v2XwV7W16o5QWDwvwVFO+g0V5TBgVBzx9tuF/tDHvUab3cDM2 zxsEdinYrURbib7xT66w4hnISnI20YjgN73grJ1lYSGjw6Tlrr79e4es3LQY4h8DF7Am JdIw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=HMyTAmrs; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-37454-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-37454-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id r12-20020a632b0c000000b005b8ef498e2bsi2974358pgr.461.2024.01.24.09.46.52 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jan 2024 09:46:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-37454-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=HMyTAmrs; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-37454-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-37454-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 4149F28708A for ; Wed, 24 Jan 2024 17:46:52 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 83E671272CE; Wed, 24 Jan 2024 17:46:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="HMyTAmrs" Received: from mail-yw1-f178.google.com (mail-yw1-f178.google.com [209.85.128.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 187641272BC for ; Wed, 24 Jan 2024 17:46:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.178 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706118397; cv=none; b=Y38Fk4iaAIDHqC7iEF4qwVuyNMX0i/a3wkD5N9Gfol63kUj/dAvkt/AixSWBLAu3RqDC9mI72neJBYILrPlqEWW5kms1WSE3ixnhOdTVLRJDojHpHT4mzS6+BYjp5Xi47eXhnA76ejQyz/YtmGF1+hDmc3/5sCXNC6CRTv1Rf4c= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706118397; c=relaxed/simple; bh=85X4MSVhu6FZyouQ/Vf2EMa2KnP6pI2T2c6pSfjhuGs=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=mEq3IFcUU44Zgu9NGcUYJFckk2Tdb2AbLKU0ypp9oYuF4heSSoKo5H8UQXcIlBwGFaeh51kvJoCEpY+HwcrVueG7Tk58/bYeSwkRVU06OOD+JDMgIyvWK0iLRtAkoBkKVjw6vRWPSXtzNuG0lbARqR8LbwXfb8IQAA6ySdQCXIE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=HMyTAmrs; arc=none smtp.client-ip=209.85.128.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-yw1-f178.google.com with SMTP id 00721157ae682-5ffcb478512so29858297b3.0 for ; Wed, 24 Jan 2024 09:46:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706118395; x=1706723195; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ZcrjU1rXiL0wUCSwbKfLMXGroLa5TBhUecGAmJobD/I=; b=HMyTAmrsiXetH6P5VydKM0Vcw4WVkpfcVzjQh2vRhvBmUwYiHkShOidKwUJT/soYI2 XaSxIODvBaYvS3536EVYAH/wD+/YoCnELYv3QIIS9rzO4oSik9O/Dq2fWlqpFMtB6NcC /TcMFDI2Q1h/qEQ7DowwDupEFChFwUbSncRzkBWerK5leK6WadX+qAeaDcCP8QKSUnhU ntcujFh0XHmUw3Zjg48PGFCojMs5KLBoPbjnCqaBpOX4OG+xvijnOMjAVYURmcEZDi4q 1Wmc5ZkhZw8hhcuM/vfZ2B3KL3VvHJW0GXJ8RAIGuvh0uBPBCRwD1PeByazGUkA+0/wD JkDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706118395; x=1706723195; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZcrjU1rXiL0wUCSwbKfLMXGroLa5TBhUecGAmJobD/I=; b=tymcUGERxyHzXqtbjiOrd4DNuTvDHAYRAf+N0k8kymlx/wrVYJdvcglipstnaIiUwJ d8zrW+YnCuPsRxvZo7klE+6GOCyBBzzGBELvkK84WWyfNpPjQR5O1fgU+J3XfdKm7QM7 VTgOAm6kykV6RfJHFGA641sFNhXF5DeOKpzlvYSqvIG8D6yvAMYZcTI1S1i/AeD+vEdI I3jcLOSKw7ZioPN70Ik+R6alJ5icUe1SU5dSEeoz3PkHQqoYzpjM2MjkLm1bSy34BaMP H5ZanyaVQm3JOYjQVV7p3uXVQxiiOH4OZSNBs46isL+c1hFJHtkikPWWHsESB5W9owHR E+eA== X-Gm-Message-State: AOJu0YwsTrYVh3q/m68Tee/Y61cbtjiKY92tmpyuL26MLAvPmzTtDjC6 Vt9dU2Udkldiuin7FORRonrtlTzZhWFGCOyOvNxAlrbKNDWEDqsIDSxyEdMN91ojBy7WMLh3VwG hn3kagc4GV4CdDH753wykGjFtEVq/uzZtZGy3 X-Received: by 2002:a81:ae21:0:b0:5ff:a961:d91c with SMTP id m33-20020a81ae21000000b005ffa961d91cmr19539ywh.1.1706118394850; Wed, 24 Jan 2024 09:46:34 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240121214413.833776-1-tjmercier@google.com> <20240123164819.GB1745986@cmpxchg.org> In-Reply-To: <20240123164819.GB1745986@cmpxchg.org> From: "T.J. Mercier" Date: Wed, 24 Jan 2024 09:46:23 -0800 Message-ID: Subject: Re: [PATCH] Revert "mm:vmscan: fix inaccurate reclaim during proactive reclaim" To: Johannes Weiner Cc: Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , android-mm@google.com, yuzhao@google.com, yangyifei03@kuaishou.com, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Jan 23, 2024 at 8:48=E2=80=AFAM Johannes Weiner wrote: > > The revert isn't a straight-forward solution. > > The patch you're reverting fixed conventional reclaim and broke > MGLRU. Your revert fixes MGLRU and breaks conventional reclaim. > > On Tue, Jan 23, 2024 at 05:58:05AM -0800, T.J. Mercier wrote: > > They both are able to make progress. The main difference is that a > > single iteration of try_to_free_mem_cgroup_pages with MGLRU ends soon > > after it reclaims nr_to_reclaim, and before it touches all memcgs. So > > a single iteration really will reclaim only about SWAP_CLUSTER_MAX-ish > > pages with MGLRU. WIthout MGLRU the memcg walk is not aborted > > immediately after nr_to_reclaim is reached, so a single call to > > try_to_free_mem_cgroup_pages can actually reclaim thousands of pages > > even when sc->nr_to_reclaim is 32. (I.E. MGLRU overreclaims less.) > > https://lore.kernel.org/lkml/20221201223923.873696-1-yuzhao@google.com/ > > Is that a feature or a bug? Feature! > * 1. Memcg LRU only applies to global reclaim, and the round-robin incre= menting > * of their max_seq counters ensures the eventual fairness to all elig= ible > * memcgs. For memcg reclaim, it still relies on mem_cgroup_iter(). > > If it bails out exactly after nr_to_reclaim, it'll overreclaim > less. But with steady reclaim in a complex subtree, it will always hit > the first cgroup returned by mem_cgroup_iter() and then bail. This > seems like a fairness issue. Right. Because the memcg LRU is maintained in pg_data_t and not in each cgroup, I think we are currently forced to have the iteration across all child memcgs for non-root memcg reclaim for fairness. > We should figure out what the right method for balancing fairness with > overreclaim is, regardless of reclaim implementation. Because having > two different approaches and reverting dependent things back and forth > doesn't make sense. > > Using an LRU to rotate through memcgs over multiple reclaim cycles > seems like a good idea. Why is this specific to MGLRU? Shouldn't this > be a generic piece of memcg infrastructure? It would be pretty sweet if it were. I haven't tried to measure this part in isolation, but I know we had to abandon attempts to use per-app memcgs in the past (2018?) because the perf overhead was too much. In recent tests where this feature is used, I see some perf gains which I think are probably attributable to this. > Then there is the question of why there is an LRU for global reclaim, > but not for subtree reclaim. Reclaiming a container with multiple > subtrees would benefit from the fairness provided by a container-level > LRU order just as much; having fairness for root but not for subtrees > would produce different reclaim and pressure behavior, and can cause > regressions when moving a service from bare-metal into a container. > > Figuring out these differences and converging on a method for cgroup > fairness would be the better way of fixing this. Because of the > regression risk to the default reclaim implementation, I'm inclined to > NAK this revert. In the meantime, instead of a revert how about changing the batch size geometrically instead of the SWAP_CLUSTER_MAX constant: reclaimed =3D try_to_free_mem_cgroup_pages(memcg, - min(nr_to_reclaim - nr_reclaimed, SWAP_CLUSTER_MAX), + (nr_to_reclaim - nr_reclaimed)/2, GFP_KERNEL, reclaim_options); I think that should address the overreclaim concern (it was mentioned that the upper bound of overreclaim was 2 * request), and this should also increase the reclaim rate for root reclaim with MGLRU closer to what it was before.