Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp5904085iob; Tue, 10 May 2022 06:25:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzI/BJ7HWrQu1sib1yBQQk0qKsAUIju8fi17yHYYwT5KjVciKLwDlPkCxy/5ugJul56bZD6 X-Received: by 2002:a05:6402:158a:b0:426:9efd:57a with SMTP id c10-20020a056402158a00b004269efd057amr22482669edv.85.1652189111347; Tue, 10 May 2022 06:25:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652189111; cv=none; d=google.com; s=arc-20160816; b=XbFlvgWGGbjoKy6tRrR7Q7yRBs9zbkwMO5PtjUrbB8JF1f1FmbiqvzUlAJ0SJWsgcW jDEuoCbmda3XHUalWoCYrLJemCUkNKpsZiabdzrWl+4Dq+me2f50xbChWSRCWpEfKzNx 1n7eASzGBgaJ7EWG9XP7jgOkGfyFVH+NumVTFxV5HVYV/7yIlX2IqmVwA/gj4EuUjfX1 aC20QE/mc3RiUZMyppMcR2u/H2h4GouwKMfF+DoA/UpbcmivU8DAf7UKrr127SxejHpD dwjxsmsDmsfJY40C2p4PIERjn1PqAZZNqhkjaB1h4dmoiNBKfFni9/lwPGGAO6hQetMw 2KIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=TH585TvAgnHTmSZOjM8rN1QLsQjbdV/qDVgFlTPiMag=; b=eFh9BTy8L6Ld75X+OhhPf9HCkvytDT6jrYLZz/bFCCiBL/vaabez+BElMew8nO/TXl OI8CocdfIk54pVT9Gix4yIWEKFjQ4n+u/qaEw9k68bol5bo8MRQhNXOfuQlf6fWsv6CU TsArGmOx7HWvp0mfBz3pfvGNbEvMCCbGke2tCRAjTB2Af0yTf63P0I2a7jG7oPrJna7T vUUEwiTWUegSKJZOwuTZp1bPufb/5Xb9mTNvXfRWsmbS2IVAj/Hp7QBPntt4UV3kAwuw 0bXu5ur6GOdKyN/ZAlaE7fa/EVbkx/Hgu1baK7aOFqDgDvFGeoEwMsCRZFCvmc7vjlJB Xfdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=Pznwo8A4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d17-20020a170906641100b006e88f254442si16355880ejm.465.2022.05.10.06.24.45; Tue, 10 May 2022 06:25:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=Pznwo8A4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239413AbiEJKEN (ORCPT + 99 others); Tue, 10 May 2022 06:04:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238192AbiEJKEK (ORCPT ); Tue, 10 May 2022 06:04:10 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 490D2199B1C; Tue, 10 May 2022 03:00:13 -0700 (PDT) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id EB5201FA2E; Tue, 10 May 2022 10:00:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1652176811; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=TH585TvAgnHTmSZOjM8rN1QLsQjbdV/qDVgFlTPiMag=; b=Pznwo8A4vEN3f9fwF/281B3RoS4rweAUqYinYMKjVgX3y4Lrxndh7JoivztKYzNUkk9wo7 BR3907+Xk6m+OZptW749suxhN7+yeNUZJq/XkyXcI9rbCqbPgYCGqfetYZN2S6Aqpab7sf dZiLODZ3j4Adkj6cw3P6gLqn8oejq5s= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id E51792C14B; Tue, 10 May 2022 10:00:10 +0000 (UTC) Date: Tue, 10 May 2022 12:00:04 +0200 From: Michal Hocko To: CGEL Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, willy@infradead.org, shy828301@gmail.com, roman.gushchin@linux.dev, shakeelb@google.com, linmiaohe@huawei.com, william.kucharski@oracle.com, peterx@redhat.com, hughd@google.com, vbabka@suse.cz, songmuchun@bytedance.com, surenb@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, Yang Yang Subject: Re: [PATCH] mm/memcg: support control THP behaviour in cgroup Message-ID: References: <20220505033814.103256-1-xu.xin16@zte.com.cn> <6275d3e7.1c69fb81.1d62.4504@mx.google.com> <6278fa75.1c69fb81.9c598.f794@mx.google.com> <6279c354.1c69fb81.7f6c1.15e0@mx.google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6279c354.1c69fb81.7f6c1.15e0@mx.google.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 10-05-22 01:43:38, CGEL wrote: > On Mon, May 09, 2022 at 01:48:39PM +0200, Michal Hocko wrote: > > On Mon 09-05-22 11:26:43, CGEL wrote: > > > On Mon, May 09, 2022 at 12:00:28PM +0200, Michal Hocko wrote: > > > > On Sat 07-05-22 02:05:25, CGEL wrote: > > > > [...] > > > > > If there are many containers to run on one host, and some of them have high > > > > > performance requirements, administrator could turn on thp for them: > > > > > # docker run -it --thp-enabled=always > > > > > Then all the processes in those containers will always use thp. > > > > > While other containers turn off thp by: > > > > > # docker run -it --thp-enabled=never > > > > > > > > I do not know. The THP config space is already too confusing and complex > > > > and this just adds on top. E.g. is the behavior of the knob > > > > hierarchical? What is the policy if parent memcg says madivise while > > > > child says always? How does the per-application configuration aligns > > > > with all that (e.g. memcg policy madivise but application says never via > > > > prctl while still uses some madvised - e.g. via library). > > > > > > > > > > The cgroup THP behavior is align to host and totally independent just likes > > > /sys/fs/cgroup/memory.swappiness. That means if one cgroup config 'always' > > > for thp, it has no matter with host or other cgroup. This make it simple for > > > user to understand or control. > > > > All controls in cgroup v2 should be hierarchical. This is really > > required for a proper delegation semantic. > > > > Could we align to the semantic of /sys/fs/cgroup/memory.swappiness? > Some distributions like Ubuntu is still using cgroup v1. cgroup v1 interface is mostly frozen. All new features are added to the v2 interface. > > > If memcg policy madivise but application says never, just like host, the result > > > is no THP for that application. > > > > > > > > By doing this we could promote important containers's performance with less > > > > > footprint of thp. > > > > > > > > Do we really want to provide something like THP based QoS? To me it > > > > sounds like a bad idea and if the justification is "it might be useful" > > > > then I would say no. So you really need to come with a very good usecase > > > > to promote this further. > > > > > > At least on some 5G(communication technology) machine, it's useful to provide > > > THP based QoS. Those 5G machine use micro-service software architecture, in > > > other words one service application runs in one container. > > > > I am not really sure I understand. If this is one application per > > container (cgroup) then why do you really need per-group setting? > > Does the application is a set of different processes which are only very > > loosely tight? > > > For micro-service architecture, the application in one container is not a > set of loosely tight processes, it's aim at provide one certain service, > so different containers means different service, and different service > has different QoS demand. OK, if they are tightly coupled you could apply the same THP policy by an existing prctl interface. Why is that not feasible. As you are noting below... > 5.containers usually managed by compose software, which treats container as > base management unit; ..so the compose software can easily start up the workload by using prctl to disable THP for whatever workloads it is not suitable for. -- Michal Hocko SUSE Labs