by Tvrtko Ursulin

[permalink] [raw]

Subject: Re: [RFC 11/13] cgroup/drm: Introduce weight based drm cgroup control

On 22/11/2022 21:29, Tejun Heo wrote:
> On Wed, Nov 09, 2022 at 04:11:39PM +0000, Tvrtko Ursulin wrote:
>> +DRM scheduling soft limits
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +Because of the heterogenous hardware and driver DRM capabilities, soft limits
>> +are implemented as a loose co-operative (bi-directional) interface between the
>> +controller and DRM core.
>> +
>> +The controller configures the GPU time allowed per group and periodically scans
>> +the belonging tasks to detect the over budget condition, at which point it
>> +invokes a callback notifying the DRM core of the condition.
>> +
>> +DRM core provides an API to query per process GPU utilization and 2nd API to
>> +receive notification from the cgroup controller when the group enters or exits
>> +the over budget condition.
>> +
>> +Individual DRM drivers which implement the interface are expected to act on this
>> +in the best-effort manner only. There are no guarantees that the soft limits
>> +will be respected.
>
> Soft limits is a bit of misnomer and can be confused with best-effort limits
> such as memory.high. Prolly best to not use the term.

Are you suggesting "best effort limits" or "best effort <something>"? It
would sounds good to me if we found the right <something>. Best effort
budget perhaps?

>> +static bool
>> +__start_scanning(struct drm_cgroup_state *root, unsigned int period_us)
>> +{
>> + struct cgroup_subsys_state *node;
>> + bool ok = false;
>> +
>> + rcu_read_lock();
>> +
>> + css_for_each_descendant_post(node, &root->css) {
>> + struct drm_cgroup_state *drmcs = css_to_drmcs(node);
>> +
>> + if (!css_tryget_online(node))
>> + goto out;
>> +
>> + drmcs->active_us = 0;
>> + drmcs->sum_children_weights = 0;
>> +
>> + if (node == &root->css)
>> + drmcs->per_s_budget_ns =
>> + DIV_ROUND_UP_ULL(NSEC_PER_SEC * period_us,
>> + USEC_PER_SEC);
>> + else
>> + drmcs->per_s_budget_ns = 0;
>> +
>> + css_put(node);
>> + }
>> +
>> + css_for_each_descendant_post(node, &root->css) {
>> + struct drm_cgroup_state *drmcs = css_to_drmcs(node);
>> + struct drm_cgroup_state *parent;
>> + u64 active;
>> +
>> + if (!css_tryget_online(node))
>> + goto out;
>> + if (!node->parent) {
>> + css_put(node);
>> + continue;
>> + }
>> + if (!css_tryget_online(node->parent)) {
>> + css_put(node);
>> + goto out;
>> + }
>> + parent = css_to_drmcs(node->parent);
>> +
>> + active = drmcs_get_active_time_us(drmcs);
>> + if (active > drmcs->prev_active_us)
>> + drmcs->active_us += active - drmcs->prev_active_us;
>> + drmcs->prev_active_us = active;
>> +
>> + parent->active_us += drmcs->active_us;
>> + parent->sum_children_weights += drmcs->weight;
>> +
>> + css_put(node);
>> + css_put(&parent->css);
>> + }
>> +
>> + ok = true;
>> +
>> +out:
>> + rcu_read_unlock();
>> +
>> + return ok;
>> +}
>
> A more conventional and scalable way to go about this would be using an
> rbtree keyed by virtual time. Both CFS and blk-iocost are examples of this,
> but I think for drm, it can be a lot simpler.

It's well impressive you were able to figure out what I am doing there.
:) And probably you can see that this is the first time I am attempting
an algorithm like this one. I think I made it /dtrt/ with a few post/pre
walks so the right pieces of data propagate correctly.

Are you suggesting a parallel/shadow tree to be kept in the drm
controller (which would shadow the cgroup hierarchy)? Or something else?
The mention of rbtree is not telling me much, but I will look into the
referenced examples. (Although I will refrain from major rework until
more people start "biting" into all this.)

Also, when you mention scalability you are concerned about multiple tree
walks I have per iteration? I wasn't so much worried about that,
definitely not for the RFC, but even in general due relatively low
frequency of scanning and a good amount of less trivial cost being
outside the actual tree walks (drm client walks, GPU utilisation
calculations, maybe more). But perhaps I don't have the right idea on
how big cgroups hierarchies can be compared to number of drm clients etc.

Regards,

Tvrtko

2022-11-28 20:18:45

by Tejun Heo

[permalink] [raw]

Subject: Re: [RFC 11/13] cgroup/drm: Introduce weight based drm cgroup control

Hello,

On Thu, Nov 24, 2022 at 02:32:25PM +0000, Tvrtko Ursulin wrote:
> > Soft limits is a bit of misnomer and can be confused with best-effort limits
> > such as memory.high. Prolly best to not use the term.
>
> Are you suggesting "best effort limits" or "best effort <something>"? It
> would sounds good to me if we found the right <something>. Best effort
> budget perhaps?

A more conventional name would be hierarchical weighted distribution.

> Also, when you mention scalability you are concerned about multiple tree
> walks I have per iteration? I wasn't so much worried about that, definitely
> not for the RFC, but even in general due relatively low frequency of
> scanning and a good amount of less trivial cost being outside the actual
> tree walks (drm client walks, GPU utilisation calculations, maybe more). But
> perhaps I don't have the right idea on how big cgroups hierarchies can be
> compared to number of drm clients etc.

It's just a better way doing this kind of weight based scheduling. It's
simpler, more scalable and easier to understand how things are working. The
basic idea is pretty simple - each schedulable entity gets assigned a
timestamp and whenever it consumes the target resource, its time is wound
forward by the consumption amount divided by its absolute share - e.g. if
cgroup A deserves 25% of the entire thing and it ran for 1s, its time is
wound forward by 1s / 0.25 == 4s. There's a rbtree keyed by these timestamps
and anything wanting to consume gets put on that tree and whatever is at the
head of the tree is the next thing to run.

Thanks.

--
tejun