Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1803695rwd; Fri, 9 Jun 2023 02:19:45 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4dlW0j8h1rQcOJMFVV1ld2fobwlhz4M6BDmWq35/bkfOU3KMYDj9OSPUpOAeGOi95n2O36 X-Received: by 2002:a17:902:db02:b0:1b2:676d:1143 with SMTP id m2-20020a170902db0200b001b2676d1143mr1779244plx.15.1686302385576; Fri, 09 Jun 2023 02:19:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686302385; cv=none; d=google.com; s=arc-20160816; b=hyNc9t9nTv7UKQN4DFWhN3xS0SGRS3+TG2cEYBa+2jxeTw4sqg1mNseAawXpt+WbpU nNrFOWuJ0OupvGhE3YFkiNZnrQzdPQfOwEucxYeZW4sLe4YLi8S+lDwlmvoknFUHmqIJ +j16LWjFgMG18kricQrkvTI2102ARS+uImHZwnHDk2qjKSAvlEI1ztT6TXEcvK2cw5UY OkMiWfB6MDOr0AQ1PXjaND4gHwlWwrHfB/vo5D7vnSDDdbmf3TmBMWm6XQvvU6mluFn+ gMAzjhaanWmCuGFG1P168SG8bZFm6dNiz8IwplfHfk3rIMHHlFvvbGooNAbDnU2cUXDO k1Dg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=vVqqNiTWipF9tqQGIXfWiYHdhPZGr5sY95HAiAKaa/c=; b=RaB9NASPx8nlWc7LvCrSMuC0rJLT6GsuOl+X7pI5ruopBAA2kQuIieCT3c9xFfLNTu b6vMhczf8yQb1LdBdrkQVPDPmNJBM9dO1RqPTQboxdukDGBLsnJSAXnxOZmKkJ451TNu V5WR9aAEC2sKWpp2YX3qjjrHleKCxi5qE6VRJkQrNLEUWS9SMmVfKpoXPn7YbsHvAsmX 0xYqw9jVaMI/rZa6hUFtpqm+7F73cidGUEIyfb4QUJYNiV5n4/sIveTfO759X0DdAyVh Ld21F/m8ImdRmVc+q25IKTOpgjlPjyGJp//s/ltqJ3GWBcj/qCfoZuphexoQT/HCDrVt i6kw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=hCpzPeWK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c11-20020a170902b68b00b001aada12d628si2308766pls.585.2023.06.09.02.19.31; Fri, 09 Jun 2023 02:19:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=hCpzPeWK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241220AbjFIJMv (ORCPT + 99 others); Fri, 9 Jun 2023 05:12:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36184 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241436AbjFIJMW (ORCPT ); Fri, 9 Jun 2023 05:12:22 -0400 Received: from mail-il1-x135.google.com (mail-il1-x135.google.com [IPv6:2607:f8b0:4864:20::135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ADE5449EA for ; Fri, 9 Jun 2023 02:07:59 -0700 (PDT) Received: by mail-il1-x135.google.com with SMTP id e9e14a558f8ab-33d928a268eso244165ab.0 for ; Fri, 09 Jun 2023 02:07:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686301639; x=1688893639; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=vVqqNiTWipF9tqQGIXfWiYHdhPZGr5sY95HAiAKaa/c=; b=hCpzPeWKrNa3kekE9t5+6tDGSGcCQNB3zp8QEUHG4S9aAMvfBGhQ4PfM93rGsuXPFa XAdMHbYGoAgMHUdP7C86X3/YC5WBNHcO+tYLDQ7WJRH3KHMLAhAPceHEz5V+7a0eNz8n 6BdC2h3I6c52v+lOoNNkqwgFIRxI4y27BpE5+EQ5P94dTP6vMSPoEzHQ766ExdYHZyCe bWLvSKay1lJ71VzY0BYjxm0cgMMSDA5XA4F8DOYu6EbSlK4V1KEpgo0701dPY9IADLJR imqcGAwsIwdrHTunYAFWuUgoUTl9bxSM7r7RNzf7zJJvpJRLUZXM/8Sz3/X1hRhSW5jD GJUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686301639; x=1688893639; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vVqqNiTWipF9tqQGIXfWiYHdhPZGr5sY95HAiAKaa/c=; b=KR6Dr80aU8FJh3BLQ/2sZmKQfkPVsz5ccoZkrr+w2Yy04h7WII1yD9/EYwdOuQ/eg1 9lwvfttJ6r/uYXycxm1NWcGeBHI8NQTI2G7fLFQazu+CcU7zNIt8K8fd4m9R1Xq5AkHO /efxIOOaCooOp6Tucn26pmnJ8NFyysJqRhfL6fa+/87w5DXox1BcB6jlnBFTpXeNYcZW TjEC4L70fJIQNq0kpCKk+VmP8dAg6gWz4EJyohknowCRb7u5kF/WfKq2YuySDTo8a7Dm k25LZnNqGFHgWqC/dWc4UB/HcI6elQJbRPsa7nGUca7S2wzfnk7iyTHh5ZjnzahJXkRi desw== X-Gm-Message-State: AC+VfDzy+wFHdN92rhBB5kEw+DwfgARuUWxL/u2yCRCwjeW/KSD+qUJh /7dVYyCbAcWxin2ufgXmhOdulHEAiNMws/txk3CzAg== X-Received: by 2002:a05:6e02:1b01:b0:33d:ac65:f95e with SMTP id i1-20020a056e021b0100b0033dac65f95emr341983ilv.12.1686301639383; Fri, 09 Jun 2023 02:07:19 -0700 (PDT) MIME-Version: 1.0 References: <20230609082712.34889-1-wuyun.abel@bytedance.com> In-Reply-To: <20230609082712.34889-1-wuyun.abel@bytedance.com> From: Eric Dumazet Date: Fri, 9 Jun 2023 11:07:05 +0200 Message-ID: Subject: Re: [RFC PATCH net-next] sock: Propose socket.urgent for sockmem isolation To: Abel Wu Cc: Tejun Heo , Christian Warloe , Wei Wang , "David S. Miller" , Jakub Kicinski , Paolo Abeni , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , David Ahern , Yosry Ahmed , "Matthew Wilcox (Oracle)" , Yu Zhao , Vasily Averin , Kuniyuki Iwashima , Martin KaFai Lau , Xin Long , Jason Xing , Michal Hocko , Alexei Starovoitov , open list , "open list:NETWORKING [GENERAL]" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 9, 2023 at 10:28=E2=80=AFAM Abel Wu = wrote: > > This is just a PoC patch intended to resume the discussion about > tcpmem isolation opened by Google in LPC'22 [1]. > > We are facing the same problem that the global shared threshold can > cause isolation issues. Low priority jobs can hog TCP memory and > adversely impact higher priority jobs. What's worse is that these > low priority jobs usually have smaller cpu weights leading to poor > ability to consume rx data. > > To tackle this problem, an interface for non-root cgroup memory > controller named 'socket.urgent' is proposed. It determines whether > the sockets of this cgroup and its descendants can escape from the > constrains or not under global socket memory pressure. > > The 'urgent' semantics will not take effect under memcg pressure in > order to protect against worse memstalls, thus will be the same as > before without this patch. > > This proposal doesn't remove protocal's threshold as we found it > useful in restraining memory defragment. As aforementioned the low > priority jobs can hog lots of memory, which is unreclaimable and > unmovable, for some time due to small cpu weight. > > So in practice we allow high priority jobs with net-memcg accounting > enabled to escape the global constrains if the net-memcg itselt is > not under pressure. While for lower priority jobs, the budget will > be tightened as the memory usage of 'urgent' jobs increases. In this > way we can finally achieve: > > - Important jobs won't be priority inversed by the background > jobs in terms of socket memory pressure/limit. > > - Global constrains are still effective, but only on non-urgent > jobs, useful for admins on policy decision on defrag. > > Comments/Ideas are welcomed, thanks! > This seems to go in a complete opposite direction than memcg promises. Can we fix memcg, so that : Each group can use the memory it was provisioned (this includes TCP buffers= ) Global tcp_memory can disappear (set tcp_mem to infinity)