2024-01-24 22:24:01

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86/resctrl: Display the number of available CLOSIDs

Hi Haifeng,

Thank you for sharing your requirements as well as submitting
changes to support them.

I would like to start with a high level overview that applies
to all three patches you submitted. This relates to customs,
formatting, and style that will help your contributions get
into the kernel.

In your next submission, please do submit your patches together
as a series with a cover letter. This means that your series
starts with a cover letter (think of it as "patch #0") and the
patches are sent in reply to that cover letter. This is described
in more detail in "Documentation/process/5.Posting.rst".

Regarding the patches themselves. Please read and follow
Documentation/process/coding-style.rst and
Documentation/process/maintainer-tip.rst regarding customs.
The former is a general document that applies to the whole kernel
while the latter contains more specific customs related to the
area you are are contributing to here.

As a final comment, please ensure that your patches
pass a "scripts/checkpatch.pl --strict" check. There are more
details about this in "Documentation/process/5.Posting.rst".

While the documents mentioned above should get you started there
is a lot of other valuable information in "Documentation/process".
Consider the index in that directory to help you navigate through
the available topics.

On 1/23/2024 1:20 AM, Haifeng Xu wrote:
> We can know the number of CLOSIDs for each rdt resource, for example:
>
> cat /sys/fs/resctrl/info/L3/num_closids
> cat /sys/fs/resctrl/info/MB/num_closids
> ...
>
> The number of available CLOSIDs is the minimal value of them. When users
> try to create new control groups, to avoid running out of CLOSIDs, they
> have to traverse /sys/fs/resctrl/ and count the number of directories.
>
> To make things more easier, add a RFTYPE_TOP_INFO file 'free_closids'
> that tells users how many free closids are left.

I do not see this as a change that benefits the kernel or user space.
It sounds to me as though user space is planning some behavior based
on what it knows about the current kernel internals and requesting
more information to peek into these internals to make it easier
to do so. The kernel can always choose to do things different
internally, but it is required to maintain a consistent interface to
user space. We should thus always take great care with new interfaces.

From what I can tell user space intends to use this "free_closids"
to mean "how many more control resource groups can be created". This
is not a contract that I think we should enter into. There has been
discussions aiming to disconnect the number of resource groups
from the number of closids (effectively letting resource groups
with the same resource allocations share a closid). This is something
that the kernel may still do at some point but sharing "free_closids"
knowing that user space intends to use it as a "number of resource groups
remaining" counter would make future enhancements like this difficult.

Could you please provide more detail in why this is required? User
space should not need to keep track to know how many groups can be
created, creating a new group will fail with ENOSPC if no more
groups can be created.

Reinette


2024-01-25 07:44:15

by Haifeng Xu

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86/resctrl: Display the number of available CLOSIDs



On 2024/1/25 06:23, Reinette Chatre wrote:
> Hi Haifeng,
>
> Thank you for sharing your requirements as well as submitting
> changes to support them.
>
> I would like to start with a high level overview that applies
> to all three patches you submitted. This relates to customs,
> formatting, and style that will help your contributions get
> into the kernel.
>
> In your next submission, please do submit your patches together
> as a series with a cover letter. This means that your series
> starts with a cover letter (think of it as "patch #0") and the
> patches are sent in reply to that cover letter. This is described
> in more detail in "Documentation/process/5.Posting.rst".
>
> Regarding the patches themselves. Please read and follow
> Documentation/process/coding-style.rst and
> Documentation/process/maintainer-tip.rst regarding customs.
> The former is a general document that applies to the whole kernel
> while the latter contains more specific customs related to the
> area you are are contributing to here.
>
> As a final comment, please ensure that your patches
> pass a "scripts/checkpatch.pl --strict" check. There are more
> details about this in "Documentation/process/5.Posting.rst".
>
> While the documents mentioned above should get you started there
> is a lot of other valuable information in "Documentation/process".
> Consider the index in that directory to help you navigate through
> the available topics.

Thanks for your suggestions.

>
> On 1/23/2024 1:20 AM, Haifeng Xu wrote:
>> We can know the number of CLOSIDs for each rdt resource, for example:
>>
>> cat /sys/fs/resctrl/info/L3/num_closids
>> cat /sys/fs/resctrl/info/MB/num_closids
>> ...
>>
>> The number of available CLOSIDs is the minimal value of them. When users
>> try to create new control groups, to avoid running out of CLOSIDs, they
>> have to traverse /sys/fs/resctrl/ and count the number of directories.
>>
>> To make things more easier, add a RFTYPE_TOP_INFO file 'free_closids'
>> that tells users how many free closids are left.
>
> I do not see this as a change that benefits the kernel or user space.
> It sounds to me as though user space is planning some behavior based
> on what it knows about the current kernel internals and requesting
> more information to peek into these internals to make it easier
> to do so. The kernel can always choose to do things different
> internally, but it is required to maintain a consistent interface to
> user space. We should thus always take great care with new interfaces.
>
> From what I can tell user space intends to use this "free_closids"
> to mean "how many more control resource groups can be created". This
> is not a contract that I think we should enter into. There has been
> discussions aiming to disconnect the number of resource groups
> from the number of closids (effectively letting resource groups
> with the same resource allocations share a closid).

Is this feature merged into latest kernel or just discussions?
Could you please provide more details?
Last time, you mentioned that a monitoring group can be moved
from one control group to another.

This is something
> that the kernel may still do at some point but sharing "free_closids"
> knowing that user space intends to use it as a "number of resource groups
> remaining" counter would make future enhancements like this difficult.

OK, thanks.

>
> Could you please provide more detail in why this is required? User
> space should not need to keep track to know how many groups can be
> created, creating a new group will fail with ENOSPC if no more
> groups can be created.
>

User space reports alerts when failing to create new groups. If no one tell them that
closids aren't enough, they will keep trying to create new groups and the number of alerts
could be very high.

So we want to know how many closids are available, if it's zero, we give up creating
new control groups and those alerts will disappear.

Maybe user behavoirs can be ajusted. There is no need to create too many groups, especially
for those groups with same resource. Or as you mentioned above, we can reuse closid.


> Reinette

2024-01-25 17:07:02

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86/resctrl: Display the number of available CLOSIDs

Hi Haifeng,

On 1/24/2024 11:44 PM, Haifeng Xu wrote:
> On 2024/1/25 06:23, Reinette Chatre wrote:
>> On 1/23/2024 1:20 AM, Haifeng Xu wrote:
>>> We can know the number of CLOSIDs for each rdt resource, for example:
>>>
>>> cat /sys/fs/resctrl/info/L3/num_closids
>>> cat /sys/fs/resctrl/info/MB/num_closids
>>> ...
>>>
>>> The number of available CLOSIDs is the minimal value of them. When users
>>> try to create new control groups, to avoid running out of CLOSIDs, they
>>> have to traverse /sys/fs/resctrl/ and count the number of directories.
>>>
>>> To make things more easier, add a RFTYPE_TOP_INFO file 'free_closids'
>>> that tells users how many free closids are left.
>>
>> I do not see this as a change that benefits the kernel or user space.
>> It sounds to me as though user space is planning some behavior based
>> on what it knows about the current kernel internals and requesting
>> more information to peek into these internals to make it easier
>> to do so. The kernel can always choose to do things different
>> internally, but it is required to maintain a consistent interface to
>> user space. We should thus always take great care with new interfaces.
>>
>> From what I can tell user space intends to use this "free_closids"
>> to mean "how many more control resource groups can be created". This
>> is not a contract that I think we should enter into. There has been
>> discussions aiming to disconnect the number of resource groups
>> from the number of closids (effectively letting resource groups
>> with the same resource allocations share a closid).
>
> Is this feature merged into latest kernel or just discussions?
> Could you please provide more details?
> Last time, you mentioned that a monitoring group can be moved
> from one control group to another.

It was the original proposal that started a discussion [1]. The discussion
about the problem needing solved instead led to the feature where a monitoring
group can be moved from one control group to another. This solution only works
for AMD and Intel though.

> This is something
>> that the kernel may still do at some point but sharing "free_closids"
>> knowing that user space intends to use it as a "number of resource groups
>> remaining" counter would make future enhancements like this difficult.
>
> OK, thanks.
>
>>
>> Could you please provide more detail in why this is required? User
>> space should not need to keep track to know how many groups can be
>> created, creating a new group will fail with ENOSPC if no more
>> groups can be created.
>>
>
> User space reports alerts when failing to create new groups. If no one tell them that
> closids aren't enough, they will keep trying to create new groups and the number of alerts
> could be very high.

When user space receives the "no space available" it should be expected that
any more attempts to create new groups will also fail (until a resource group is
removed). It is futile for user space to keep trying in this scenario.

>
> So we want to know how many closids are available, if it's zero, we give up creating
> new control groups and those alerts will disappear.
>
> Maybe user behavoirs can be ajusted. There is no need to create too many groups, especially
> for those groups with same resource. Or as you mentioned above, we can reuse closid.

Reinette

[1] https://lore.kernel.org/lkml/CALPaoCj-zav4x6H3ffXo_O+RFan8Qb-uLy-DdtkaQTfuxY4a0w@mail.gmail.com/