Received: by 2002:a05:7412:40d:b0:e2:908c:2ebd with SMTP id 13csp965657rdf; Wed, 22 Nov 2023 01:38:53 -0800 (PST) X-Google-Smtp-Source: AGHT+IEQMr/fNrRJC2kiM/6KLv6X7i7ItmNbiC6ec3H3un/8mQFhSCGHdpxDxfpYse5zkJMeW0Wa X-Received: by 2002:a92:c246:0:b0:35b:77c:c4b0 with SMTP id k6-20020a92c246000000b0035b077cc4b0mr1872385ilo.26.1700645932959; Wed, 22 Nov 2023 01:38:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700645932; cv=none; d=google.com; s=arc-20160816; b=CgNxUJrRkUZLMMfjwhF2C4q0qjC3Lc5QXRDD3OHhqxwkOLoaFWZUDetms0JvpqG91S iDPh4JdUqYH1dsAWepgqPbHPMPqAmrx4XEwfG7qRi3+dvGG9mShQvJIeV61uaAT/A6Pq iXJaPH5tZ5KLhrVU4VAx72HGKiyc6iVy/C0gqpVfwGHbEevtPn+J3PgtryOMS+uumYpy fvsc/KdyUfHzEo/TytKxMzt1pEMXNH9tbPxTbzS9qUtnlNnAZEsNl82O60PFL7zRwnLz vYHFeLV7ioRPRcYs2swfLib8u1e5ILSxm4fzUFw+HxX39RWmc0++095RvuX5VqPb7RqQ Lt3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:subject:cc:to:from :content-language:mime-version:date:dkim-signature:message-id; bh=hzQTsEYOMm4An3het2XGLy3/nW2/MSOzvCz8NWvRraU=; fh=aglcCnbmDOkb2/xd9Rb1B162Oab/fUHtbGjfN0FNfvk=; b=De6PimZp0S7Fff7cnqLzJZtiQwbY3t7z99XkRAHD7ty2myVly5mwLMSg3LOAEqMPbr 8j7W0tZDqEEhGGa4c3BFac1niMfxASP+AZHR8hK/3u3wx3YLyt90UHqinaLHeOdTDD1H bS/o+SSQsk6gIJmZcYDfxgsKlxvo6FUkg8cDlu/dfW6m7vmxGtXmCj17dmNandb96vQ1 EKV+SwzFdheCQ8M3DT3OTJypuXDKHQObLYlxHtuB5PRFB12Jh/n+B/2NwCSMQ6QyS9SL uTBfkT8lff+GNuL7PYAkCA4ligUjq/TLPpfTjOBmpC3n8GkcTo0+3vTjGLZh3zWU0LeW thfA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=gENINQ8D; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id g17-20020a63fa51000000b005b902b61cbbsi12191617pgk.125.2023.11.22.01.38.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Nov 2023 01:38:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=gENINQ8D; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 25469810F6CA; Wed, 22 Nov 2023 01:38:50 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343501AbjKVJii (ORCPT + 99 others); Wed, 22 Nov 2023 04:38:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37208 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235129AbjKVJih (ORCPT ); Wed, 22 Nov 2023 04:38:37 -0500 Received: from out-182.mta0.migadu.com (out-182.mta0.migadu.com [91.218.175.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1391183 for ; Wed, 22 Nov 2023 01:38:33 -0800 (PST) Message-ID: <109029e0-1772-4102-a2a8-ab9076462454@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1700645911; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=hzQTsEYOMm4An3het2XGLy3/nW2/MSOzvCz8NWvRraU=; b=gENINQ8D2HHl+YbB5l3Lhsql8T2XGT6IAOflzD6Ia6CIkZjbuFWGyp8zNJ5jEgQZeGszrA vHmtt3dZdiuT83pKrz220NM+97JIGMnfllW/viSxKcyLZdBT7+xrWV6MLnEgez0v1Xr9el e2qhBZE1nE2FWIH01b8P5SEx6uaZU9g= Date: Wed, 22 Nov 2023 17:38:25 +0800 MIME-Version: 1.0 Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou To: LKML , linux-mm Cc: jack@suse.cz, Tejun Heo , Johannes Weiner , Christoph Hellwig , shr@devkernel.io, neilb@suse.de, Michal Hocko Subject: Question: memcg dirty throttle caused by low per-memcg dirty thresh Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Wed, 22 Nov 2023 01:38:50 -0800 (PST) Hello all, Sorry to bother you, we encountered a problem related to the memcg dirty throttle after migrating from cgroup v1 to v2, so want to ask for some comments or suggestions. 1. Problem We have the "containerd" service running under system.slice, with its memory.max set to 5GB. It will be constantly throttled in the balance_dirty_pages() since the memcg has dirty memory more than the memcg dirty thresh. We haven't this problem on cgroup v1, because cgroup v1 doesn't have the per-memcg writeback and per-memcg dirty thresh. Only the global dirty thresh will be checked in balance_dirty_pages(). 2. Thinking So we wonder if we can support the per-memcg dirty thresh interface? Now the memcg dirty thresh is just calculated from memcg max * ratio, which can be set from /proc/sys/vm/dirty_ratio. We have to set it to 60 instead of the default 20 to workaround now, but worry about the potential side effects. If we can support the per-memcg dirty thresh interface, we can set some containers to a much higher dirty_ratio, especially for hungry dirtier workloads like "containerd". 3. Solution? But we could't think of a good solution to support this. The current memcg dirty thresh is calculated from a complex rule: memcg dirty thresh = memcg avail * dirty_ratio memcg avail is from combination of: memcg max/high, memcg files and capped by system-wide clean memory excluding the amount being used in the memcg. Although we may find a way to calculate the per-memcg dirty thresh, we can't use it directly, since we still need to calculate/distribute dirty thresh to the per-wb dirty thresh share. R - A - B \-- C For example, if we know the dirty thresh of A, but wb is in C, we have no way to distribute the dirty thresh shares to the wb in C. But we have to get the dirty thresh of the wb in C, since we need it to control throttling process of the wb in balance_dirty_pages(). I may have missed something above, but the problem seems clear IMHO. Looking forward to any comment or suggestion. Thanks!