Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp8992603rwb; Thu, 24 Nov 2022 06:49:57 -0800 (PST) X-Google-Smtp-Source: AA0mqf6hX95d5aSd2N2ExNFI3au6NQIemisVr01VaTvu2IORJcZKpbon+sUZ8HX1BrXXhUj+LWc7 X-Received: by 2002:a17:903:4111:b0:189:596a:1499 with SMTP id r17-20020a170903411100b00189596a1499mr3489915pld.157.1669301397675; Thu, 24 Nov 2022 06:49:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669301397; cv=none; d=google.com; s=arc-20160816; b=qjrs9hGJSs3kAX8r6io6aF8I/ixo55CEtueB4/yvffj2EP2OVhj3QnCgjtt+/hnb8w e/+irs/NfHwHbRm4YaQRz+NpzPSV8v6WyibrzyE+P6KPSMOrEBxzJExygfQOam6+54qx xiVXJy036w6FzFud5XHVZIfHpXA4PnRbFjX9W5bkJWBk2rSRNDS5NPH34dSukQFlhAGw sYus6W+abi3ihAs8PLijNILqBr+zvNe5BbM2K0jsLCGLsJh9qsw8jou7S2KLyE20k8sS +6vu9PMqWum67AYK0+LMlVF/MXW4JhfzOU+yrDJGUCt5uMdqx2zWIWO4QVIZOozH5GYe uQIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :organization:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:dkim-signature; bh=r/4E4rnsNe4iryMazdCdt5p7ZOfLR0pEBuBFC3lf1vc=; b=QOJMkVq4UrEKalPKoNPADr7leWwMLxHIeV4tUVDTNnZiKEVOxJI2wgA/P6riuiXU8R SEYRHMHoWjI/h9IXK8m/uXttCd1zjFUBaPiB+CZfnnNa7r7oaSPv+DaLRu/2p/7gDoS0 IintwJ7hXs0OEHzGZ9cb0wYLHXwsLwQzvtH6J8zUEywReIsV51EB+tvZl4RXmdBiy27w 2q2tIxbu+4r1L5sW5DeakIuy4crbEfRCWaOflF34cOSm5pYVmwvtZyV8itgMqkkn2VQl R0DLZO/Mz2by3lR1pur1Ph8VDZIjpzjyRegZmIKeoSc9cX02BsmezFED2sy8AnRqTTLg 1pQg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=YdWcf41f; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bf4-20020a17090b0b0400b0021305bbcb6bsi1363982pjb.90.2022.11.24.06.49.46; Thu, 24 Nov 2022 06:49:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=YdWcf41f; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229645AbiKXOci (ORCPT + 87 others); Thu, 24 Nov 2022 09:32:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36078 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229620AbiKXOcg (ORCPT ); Thu, 24 Nov 2022 09:32:36 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF540186C7; Thu, 24 Nov 2022 06:32:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1669300351; x=1700836351; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=qsOe7kr5bqAeEs2k/WY123wWj03USp72QHcrBt6ZUWY=; b=YdWcf41fq39Ivhh7dLTW1aIq1dwOroSpPnFVu93vZTPkOyC7K7G4s4gK Dx4VDW1LgW7tN/7Vf8FKZyGvJ59kCf0f078XBW1QNWzwgVpk2hEpMKk07 rgMQH9e5PkQd+v6xn0pob1hpncibFwQATmaZmJ6Xt8iaeaZ0Z5g7qByQQ iZAgc4Ec3nr2AJqlM6MT7w+gZ96t8yjIIhxOlHBQsK2nG1jySxN6hlX9C xnPwJ0ayNdCoi5Yo65pyaot1S3/byRetZwC/erQYDsWY32Uu08UO4NmJB XtmHY+3sRfu7KVO5IsrQyY7a5n0HQkDh2x/disLFz+I6wJF7xhZLShbRc g==; X-IronPort-AV: E=McAfee;i="6500,9779,10541"; a="311948854" X-IronPort-AV: E=Sophos;i="5.96,190,1665471600"; d="scan'208";a="311948854" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Nov 2022 06:32:31 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10541"; a="644511115" X-IronPort-AV: E=Sophos;i="5.96,190,1665471600"; d="scan'208";a="644511115" Received: from smurr10x-mobl1.amr.corp.intel.com (HELO [10.213.209.98]) ([10.213.209.98]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Nov 2022 06:32:27 -0800 Message-ID: <30f42096-3f42-594e-8ff1-c09341925518@linux.intel.com> Date: Thu, 24 Nov 2022 14:32:25 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: [RFC 11/13] cgroup/drm: Introduce weight based drm cgroup control Content-Language: en-US To: Tejun Heo Cc: Intel-gfx@lists.freedesktop.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Johannes Weiner , Zefan Li , Dave Airlie , Daniel Vetter , Rob Clark , =?UTF-8?Q?St=c3=a9phane_Marchesin?= , "T . J . Mercier" , Kenny.Ho@amd.com, =?UTF-8?Q?Christian_K=c3=b6nig?= , Brian Welty , Tvrtko Ursulin References: <20221109161141.2987173-1-tvrtko.ursulin@linux.intel.com> <20221109161141.2987173-12-tvrtko.ursulin@linux.intel.com> From: Tvrtko Ursulin Organization: Intel Corporation UK Plc In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,HK_RANDOM_ENVFROM,HK_RANDOM_FROM, NICE_REPLY_A,RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 22/11/2022 21:29, Tejun Heo wrote: > On Wed, Nov 09, 2022 at 04:11:39PM +0000, Tvrtko Ursulin wrote: >> +DRM scheduling soft limits >> +~~~~~~~~~~~~~~~~~~~~~~~~~~ >> + >> +Because of the heterogenous hardware and driver DRM capabilities, soft limits >> +are implemented as a loose co-operative (bi-directional) interface between the >> +controller and DRM core. >> + >> +The controller configures the GPU time allowed per group and periodically scans >> +the belonging tasks to detect the over budget condition, at which point it >> +invokes a callback notifying the DRM core of the condition. >> + >> +DRM core provides an API to query per process GPU utilization and 2nd API to >> +receive notification from the cgroup controller when the group enters or exits >> +the over budget condition. >> + >> +Individual DRM drivers which implement the interface are expected to act on this >> +in the best-effort manner only. There are no guarantees that the soft limits >> +will be respected. > > Soft limits is a bit of misnomer and can be confused with best-effort limits > such as memory.high. Prolly best to not use the term. Are you suggesting "best effort limits" or "best effort "? It would sounds good to me if we found the right . Best effort budget perhaps? >> +static bool >> +__start_scanning(struct drm_cgroup_state *root, unsigned int period_us) >> +{ >> + struct cgroup_subsys_state *node; >> + bool ok = false; >> + >> + rcu_read_lock(); >> + >> + css_for_each_descendant_post(node, &root->css) { >> + struct drm_cgroup_state *drmcs = css_to_drmcs(node); >> + >> + if (!css_tryget_online(node)) >> + goto out; >> + >> + drmcs->active_us = 0; >> + drmcs->sum_children_weights = 0; >> + >> + if (node == &root->css) >> + drmcs->per_s_budget_ns = >> + DIV_ROUND_UP_ULL(NSEC_PER_SEC * period_us, >> + USEC_PER_SEC); >> + else >> + drmcs->per_s_budget_ns = 0; >> + >> + css_put(node); >> + } >> + >> + css_for_each_descendant_post(node, &root->css) { >> + struct drm_cgroup_state *drmcs = css_to_drmcs(node); >> + struct drm_cgroup_state *parent; >> + u64 active; >> + >> + if (!css_tryget_online(node)) >> + goto out; >> + if (!node->parent) { >> + css_put(node); >> + continue; >> + } >> + if (!css_tryget_online(node->parent)) { >> + css_put(node); >> + goto out; >> + } >> + parent = css_to_drmcs(node->parent); >> + >> + active = drmcs_get_active_time_us(drmcs); >> + if (active > drmcs->prev_active_us) >> + drmcs->active_us += active - drmcs->prev_active_us; >> + drmcs->prev_active_us = active; >> + >> + parent->active_us += drmcs->active_us; >> + parent->sum_children_weights += drmcs->weight; >> + >> + css_put(node); >> + css_put(&parent->css); >> + } >> + >> + ok = true; >> + >> +out: >> + rcu_read_unlock(); >> + >> + return ok; >> +} > > A more conventional and scalable way to go about this would be using an > rbtree keyed by virtual time. Both CFS and blk-iocost are examples of this, > but I think for drm, it can be a lot simpler. It's well impressive you were able to figure out what I am doing there. :) And probably you can see that this is the first time I am attempting an algorithm like this one. I think I made it /dtrt/ with a few post/pre walks so the right pieces of data propagate correctly. Are you suggesting a parallel/shadow tree to be kept in the drm controller (which would shadow the cgroup hierarchy)? Or something else? The mention of rbtree is not telling me much, but I will look into the referenced examples. (Although I will refrain from major rework until more people start "biting" into all this.) Also, when you mention scalability you are concerned about multiple tree walks I have per iteration? I wasn't so much worried about that, definitely not for the RFC, but even in general due relatively low frequency of scanning and a good amount of less trivial cost being outside the actual tree walks (drm client walks, GPU utilisation calculations, maybe more). But perhaps I don't have the right idea on how big cgroups hierarchies can be compared to number of drm clients etc. Regards, Tvrtko