Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1761344rwd; Tue, 13 Jun 2023 13:56:00 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4RVej0/vWBeoEGmuu0TelyXIgKWQLtj5gJ7OMBXIFL7Lh20AkO8Zv9VpqHKq1SVRt7LpUZ X-Received: by 2002:a17:907:1c06:b0:969:f9e8:a77c with SMTP id nc6-20020a1709071c0600b00969f9e8a77cmr12882809ejc.64.1686689760528; Tue, 13 Jun 2023 13:56:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686689760; cv=none; d=google.com; s=arc-20160816; b=Mlj8I6VkQosOwvLMu1lzz3Ngo2B3OlmeCI2uA8MzuwFxnks+/zsr8Tk5WLE4ZKnIY8 CNn3dK8yWgLpMgMgryKUQ47mM8UaK6R0uurpctNYgtq1A5iYcgWy7yc9nzyrAzy6tjZV sXr2FtI5i6iPgBVkeLcvexUR0DMYAoMynAcOwj30Q6ZPzPP/bV29tSDPjYEjaap0bP9c DfNC3zEgBaPfTu0pxFOOvhh7hLl56wwNYrKqJnrM4AMUohDFRisDyxIZFz1EE+k+g9dE oGT+Pesjw4cthFOyMw3zBiCvV4+5zb2kIEJV+jYvcmVPUez6GajwUEBP/64jmCaiQas8 4aSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=Tu74121bGTOyi/gScvgyRcJ4UHCFo8s9EjAjR7M4bno=; b=jQ8966vDLtJsHtuqsQfyL1LKXTEEdXYMWzIOSUav7CB/0EGtGcLaXF1XiIxLpyk1yN pD/2s3N1rXJhcDDz9t3dsj5fRIVSMzaXLB+pJubqSjCKQMaEcfEUflD+n7eQLVFtHfka BXwYPU72iz0qiRY6pXTixqVHdo3hqFLbiNcVRlS8Li2QxwHs5Gm0azBAroHP0MwANXoZ ikCNB7ZqNeXkEUSDI9D40CR4zIOH1di1gpcdWCVF9RcLNawK7ktpEbIrZ/P31nGJDmRS UW12NLz7+6phHuBnxRrlyVr9l0N7xKbqB7UWW8G23OGcPZhNzwMPiT2w7ZBcZb0k2iVW lL1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=ulZ0jrJS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gt26-20020a170906f21a00b0097073aafa92si4274071ejb.908.2023.06.13.13.55.35; Tue, 13 Jun 2023 13:56:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=ulZ0jrJS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231532AbjFMUZ1 (ORCPT + 99 others); Tue, 13 Jun 2023 16:25:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35872 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240844AbjFMUZE (ORCPT ); Tue, 13 Jun 2023 16:25:04 -0400 Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0FA3B1996 for ; Tue, 13 Jun 2023 13:25:03 -0700 (PDT) Received: by mail-ej1-x630.google.com with SMTP id a640c23a62f3a-977cc662f62so862575066b.3 for ; Tue, 13 Jun 2023 13:25:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686687901; x=1689279901; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Tu74121bGTOyi/gScvgyRcJ4UHCFo8s9EjAjR7M4bno=; b=ulZ0jrJSHTa44Qg7LztFaCU5OztMo1pzVdcwaQdm3K0MJ+V8u2aBjSkTyr9cquXrQA GHi/c9unlglS2gH6aSu87kJPrEnsYJkUqT7K98t95dvhTb2hl8FnJlyKH59hFs/QomsE B0IG2WTvaJQe+TMeAvtQzzZCYw9ZbLHP7XM+ayfnU//j7Plf0t2fyDdDz/YHSjg1fE1y vfJ7xZTNRGH04qcTdDaxtvDjnf7xuB9RSqUHby5veNVYLqpQ7nGJy4ytb5MQtdmPP4O8 yNPNCgR/Ew0tge3k25Z+klUrBuGsQ4LGa/QDhPzFgee+jNvpGtHmhB70UEtFK1O60XHE s3iA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686687901; x=1689279901; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Tu74121bGTOyi/gScvgyRcJ4UHCFo8s9EjAjR7M4bno=; b=N4ArOwFwZoh1qYRYQIUfOTdh/QpA4qiXFAYdBxOdTWwclsSCUaid1GUMcHtqDvzRbm fo5GRBKxNsrblmiS8ILeKb1y5h/L9iRG3vWBX2tspqKWgv1k+nz6QuT2da409r26EH6+ xx2gfZi3iAjkd/WGWcEhxK0SczLz6OCl2a+E60u1thH8HrITambh5q3v2nG1gc7Og1Sb YFxAoPd4bpTfMNXa1cBHXbcDcjJqE7gglZUu3tOwmSOeagccrOlM6SY2etiX+vPF3jjn QUGSLSBGiK3jDfYOESyXtrZoEGNQ3Mn0R94ikxwZOAVquhSJvbd+qTC8qZKBPpdP5A80 /XWg== X-Gm-Message-State: AC+VfDxGGNbiaY9n6bKC309+sOP9wyBt3k2x+3SSIREhf7gqtJhVfAGs 8XF4AFqL9ILEgl+WKn4MLLJhBBxEkmZLZAl54mFcrQ== X-Received: by 2002:a17:907:2d8f:b0:959:5407:3e65 with SMTP id gt15-20020a1709072d8f00b0095954073e65mr16064057ejc.55.1686687901264; Tue, 13 Jun 2023 13:25:01 -0700 (PDT) MIME-Version: 1.0 References: <66F9BB37-3BE1-4B0F-8DE1-97085AF4BED2@didiglobal.com> In-Reply-To: From: Yosry Ahmed Date: Tue, 13 Jun 2023 13:24:24 -0700 Message-ID: Subject: Re: [PATCH v3 0/2] memcontrol: support cgroup level OOM protection To: Michal Hocko Cc: =?UTF-8?B?56iL5Z6y5rabIENoZW5na2FpdGFvIENoZW5n?= , "tj@kernel.org" , "lizefan.x@bytedance.com" , "hannes@cmpxchg.org" , "corbet@lwn.net" , "roman.gushchin@linux.dev" , "shakeelb@google.com" , "akpm@linux-foundation.org" , "brauner@kernel.org" , "muchun.song@linux.dev" , "viro@zeniv.linux.org.uk" , "zhengqi.arch@bytedance.com" , "ebiederm@xmission.com" , "Liam.Howlett@oracle.com" , "chengzhihao1@huawei.com" , "pilgrimtao@gmail.com" , "haolee.swjtu@gmail.com" , "yuzhao@google.com" , "willy@infradead.org" , "vasily.averin@linux.dev" , "vbabka@suse.cz" , "surenb@google.com" , "sfr@canb.auug.org.au" , "mcgrof@kernel.org" , "sujiaxun@uniontech.com" , "feng.tang@intel.com" , "cgroups@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" , David Rientjes Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 13, 2023 at 5:06=E2=80=AFAM Michal Hocko wrot= e: > > On Tue 13-06-23 01:36:51, Yosry Ahmed wrote: > > +David Rientjes > > > > On Tue, Jun 13, 2023 at 1:27=E2=80=AFAM Michal Hocko = wrote: > > > > > > On Sun 04-06-23 01:25:42, Yosry Ahmed wrote: > > > [...] > > > > There has been a parallel discussion in the cover letter thread of = v4 > > > > [1]. To summarize, at Google, we have been using OOM scores to > > > > describe different job priorities in a more explicit way -- regardl= ess > > > > of memory usage. It is strictly priority-based OOM killing. Ties ar= e > > > > broken based on memory usage. > > > > > > > > We understand that something like memory.oom.protect has an advanta= ge > > > > in the sense that you can skip killing a process if you know that i= t > > > > won't free enough memory anyway, but for an environment where multi= ple > > > > jobs of different priorities are running, we find it crucial to be > > > > able to define strict ordering. Some jobs are simply more important > > > > than others, regardless of their memory usage. > > > > > > I do remember that discussion. I am not a great fan of simple priorit= y > > > based interfaces TBH. It sounds as an easy interface but it hits > > > complications as soon as you try to define a proper/sensible > > > hierarchical semantic. I can see how they might work on leaf memcgs w= ith > > > statically assigned priorities but that sounds like a very narrow > > > usecase IMHO. > > > > Do you mind elaborating the problem with the hierarchical semantics? > > Well, let me be more specific. If you have a simple hierarchical numeric > enforcement (assume higher priority more likely to be chosen and the > effective priority to be max(self, max(parents)) then the semantic > itslef is straightforward. > > I am not really sure about the practical manageability though. I have > hard time to imagine priority assignment on something like a shared > workload with a more complex hierarchy. For example: > root > / | \ > cont_A cont_B cont_C > > each container running its workload with own hierarchy structures that > might be rather dynamic during the lifetime. In order to have a > predictable OOM behavior you need to watch and reassign priorities all > the time, no? In our case we don't really manage the entire hierarchy in a centralized fashion. Each container gets a score based on their relative priority, and each container is free to set scores within its subcontainers if needed. Isn't this what the hierarchy is all about? Each parent only cares about its direct children. On the system level, we care about the priority ordering of containers. Ordering within containers can be deferred to containers. > > > The way it works with our internal implementation is (imo) sensible > > and straightforward from a hierarchy POV. Starting at the OOM memcg > > (which can be root), we recursively compare the OOM scores of the > > children memcgs and pick the one with the lowest score, until we > > arrive at a leaf memcg. > > This approach has a strong requirement on the memcg hierarchy > organization. Siblings have to be directly comparable because you cut > off many potential sub-trees this way (e.g. is it easy to tell > whether you want to rule out all system or user slices?). > > I can imagine usecases where this could work reasonably well e.g. a set > of workers of a different priority all of them running under a shared > memcg parent. But more more involved hierarchies seem more complex > because you always keep in mind how the hierarchy is organize to get to > your desired victim. I guess the main point is what I mentioned above, you don't need to manage the entire tree, containers can manage their subtrees. The most important thing is to provide the kernel with priority ordering among containers, and optionally priority ordering within a container (disregarding other containers). > > -- > Michal Hocko > SUSE Labs