Received: by 2002:a05:7412:40d:b0:e2:908c:2ebd with SMTP id 13csp975768rdf; Wed, 22 Nov 2023 02:03:12 -0800 (PST) X-Google-Smtp-Source: AGHT+IHRmD/3ePZnAyMtGnT5MYe14rwOq+fiaGhN0UDm6zVQ3li26igYVquznaCqxZltiTxe2W+8 X-Received: by 2002:a05:6a20:e30b:b0:187:b5b8:f438 with SMTP id nb11-20020a056a20e30b00b00187b5b8f438mr1588294pzb.18.1700647392609; Wed, 22 Nov 2023 02:03:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700647392; cv=none; d=google.com; s=arc-20160816; b=Z0WzpfjmN0HJugONEkOV8HIxdhjSXfQ1WEzFaXekyHJi29C92dDx/WDBF7EMEFEOi9 uuEfet1KSCSZiwf+Gp78P+KBJuMmGJihtM/SmV298LntFQDS6VUWQPsFWBCrvXEfcn90 1VWY8vdCILTJdzhBdCjkeS+l7gS/1rLLKbqTo+8Ju3lh/d7yMpzH3b+VEYCn7g9BTFd0 Od/x0mcE9EFIZTCaXbC/h4beClBv2607W3matiHLVzAU/6kfa0v9vvtah7oz9RUnx3I3 A/WyWGT8zLzFikFKFvMH24rcJlmKJjEpUc0Au0KUVeOxiDzts+GjBiui0cyg03bWexGF 15zw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=V8UXEq4tG3CrvcnhAyz5zGu6O+LKydPLS1VtSC8nRHw=; fh=OL+61u2XBYPM034YGq7uOrL8E7OxkQVW/O8oR05Sjxg=; b=pdfSDJLUKmrWTiHNn6dW/BeBhgo258RwSabDeOoO6pCbhNQIcCVYjOtNmqcisjsEhw KKbx+nP9bP6Hwzg/zr6LvEM50a6hCt1jmzWmHfha9hDPBQQQY52evyyWYz2UltAXfSdn cennPjyO29Lal6AixG0meK4eXsUYF4xNsM/VKAV90Gomjnl2GwqUyIOwnQP44iFdSo4f Ixjsm3USGn7XdGYNGfrBKBDgWlzh+X3+62a27lOV/Tw5shcII9SWxGr+2pBtGBjXyGGu s3jZ3jWZi6bZmM3OnYZ0znWxWWf4gJZ7tP5/T/FMjc01bLZCB4b+eqFO72Ixrlo1m0uv K1Iw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=nmnKKEye; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id bt6-20020a17090af00600b002850dc3a3c2si1136204pjb.155.2023.11.22.02.03.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Nov 2023 02:03:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=nmnKKEye; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 37F0781B0CC5; Wed, 22 Nov 2023 02:03:09 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343739AbjKVKC5 (ORCPT + 99 others); Wed, 22 Nov 2023 05:02:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235393AbjKVKCn (ORCPT ); Wed, 22 Nov 2023 05:02:43 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D3E5D62 for ; Wed, 22 Nov 2023 02:02:37 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 49CBB21907; Wed, 22 Nov 2023 10:02:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1700647356; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=V8UXEq4tG3CrvcnhAyz5zGu6O+LKydPLS1VtSC8nRHw=; b=nmnKKEyemWfQTDRXkITM5LIaTg5DlCCv+QKEyJ9vjx/Oy7fEYU67YOosnuamLdKXrMi6wY rVhDixsmsmj1NmxQjJg5SeZhvCV/N7Ru6goNgBiReXpVfwbcajbDRq1Agx3jWx3PBarm/H fomFDLdRmK6VCEFIq9oe3wLUKTNJYoc= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 0DDBE13461; Wed, 22 Nov 2023 10:02:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id gdvOOrvRXWXTCQAAMHmgww (envelope-from ); Wed, 22 Nov 2023 10:02:35 +0000 Date: Wed, 22 Nov 2023 11:02:35 +0100 From: Michal Hocko To: Chengming Zhou Cc: LKML , linux-mm , jack@suse.cz, Tejun Heo , Johannes Weiner , Christoph Hellwig , shr@devkernel.io, neilb@suse.de Subject: Re: Question: memcg dirty throttle caused by low per-memcg dirty thresh Message-ID: References: <109029e0-1772-4102-a2a8-ab9076462454@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <109029e0-1772-4102-a2a8-ab9076462454@linux.dev> Authentication-Results: smtp-out1.suse.de; none X-Spam-Level: X-Spam-Score: -1.60 X-Spamd-Result: default: False [-1.60 / 50.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; NEURAL_SPAM_SHORT(3.00)[1.000]; REPLY(-4.00)[]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; RCPT_COUNT_SEVEN(0.00)[9]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; MID_RHS_NOT_FQDN(0.50)[]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; BAYES_HAM(-0.00)[28.26%] X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 22 Nov 2023 02:03:09 -0800 (PST) On Wed 22-11-23 17:38:25, Chengming Zhou wrote: > Hello all, > > Sorry to bother you, we encountered a problem related to the memcg dirty > throttle after migrating from cgroup v1 to v2, so want to ask for some > comments or suggestions. > > 1. Problem > > We have the "containerd" service running under system.slice, with > its memory.max set to 5GB. It will be constantly throttled in the > balance_dirty_pages() since the memcg has dirty memory more than > the memcg dirty thresh. > > We haven't this problem on cgroup v1, because cgroup v1 doesn't have > the per-memcg writeback and per-memcg dirty thresh. Only the global > dirty thresh will be checked in balance_dirty_pages(). Yes, v1 didn't have any sensible IO throttling and so we had to rely on ugly hack to wait for writeback to finish from the memcg memory reclaim path. This is really suboptimal because it makes memcg reclaim stalls hard to predict. So it is essentially only a poor's man OOM prevention. V2 on the other hand has memcg aware dirty memory throttling which is a much better solution as it throttles at the moment when the memory is being dirtied. Why do you consider that to be a problem? Constant throttling as you suggest might be a result of the limit being too small? > > 2. Thinking > > So we wonder if we can support the per-memcg dirty thresh interface? > Now the memcg dirty thresh is just calculated from memcg max * ratio, > which can be set from /proc/sys/vm/dirty_ratio. In general I would recommend using dirty_bytes instead as the ratio doesn't scall all that great on larger systems. > We have to set it to 60 instead of the default 20 to workaround now, > but worry about the potential side effects. > > If we can support the per-memcg dirty thresh interface, we can set > some containers to a much higher dirty_ratio, especially for hungry > dirtier workloads like "containerd". But why would you want that? If you allow heavy writers to dirty a lot of memory then flushing that to the backing store will take more time. That could starve small writers as well because they could end up queued behind huge amount of data to be flushed. I am no expert on the writeback so others could give you a better arguments but from my POV the dirty data flushing and throttling is mostly a global mechanism to optmize the IO pattern and is a function of storage much more than workload specific. If you heavy writer hits throttling too much then either the limit is too low or you should stard background flushing earlier. -- Michal Hocko SUSE Labs