2024-03-12 07:54:13

by Rex Nie

[permalink] [raw]
Subject: [PATCH v2] fs/resctrl: fix domid loss precision issue

Below statement from mkdir_mondata_subdir function will loss precision,
because it assigns int to 14 bits bitfield.
priv.u.domid = d->id;

On some platforms(e.g.,x86), the max cache_id is the amount of L3 caches,
so it is not in the range of 0x3fff. But some platforms use higher
cache_id, e.g., arm uses cache_id as locator for cache MSC. This will
cause below issue if cache_id > 0x3fff likes:
/sys/fs/resctrl/mon_groups/p1/mon_data/mon_L3_1048564 # cat llc_occupancy
cat: read error: No such file or directory

This is the call trace when cat llc_occupancy:
rdtgroup_mondata_show()
domid = md.u.domid
d = resctrl_arch_find_domain(r, domid)

d is null here because of lossing precision

Signed-off-by: Rex Nie <[email protected]>
---
fs/resctrl/internal.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 7a6f46b4edd0..096317610949 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -94,7 +94,7 @@ union mon_data_bits {
struct {
unsigned int rid : 10;
enum resctrl_event_id evtid : 8;
- unsigned int domid : 14;
+ u32 domid;
} u;
};

--
2.34.1



2024-03-14 15:29:03

by Reinette Chatre

[permalink] [raw]
Subject: 32bit resctrl? (was Re: [PATCH v2] fs/resctrl: fix domid loss precision issue)

+x86 maintainers, Tony, Babu, Peter

Hi Everybody,

On 3/12/2024 12:53 AM, Rex Nie wrote:
> Below statement from mkdir_mondata_subdir function will loss precision,
> because it assigns int to 14 bits bitfield.
> priv.u.domid = d->id;
>
> On some platforms(e.g.,x86), the max cache_id is the amount of L3 caches,
> so it is not in the range of 0x3fff. But some platforms use higher
> cache_id, e.g., arm uses cache_id as locator for cache MSC. This will
> cause below issue if cache_id > 0x3fff likes:
> /sys/fs/resctrl/mon_groups/p1/mon_data/mon_L3_1048564 # cat llc_occupancy
> cat: read error: No such file or directory
>
> This is the call trace when cat llc_occupancy:
> rdtgroup_mondata_show()
> domid = md.u.domid
> d = resctrl_arch_find_domain(r, domid)
>
> d is null here because of lossing precision
>
> Signed-off-by: Rex Nie <[email protected]>
> ---
> fs/resctrl/internal.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index 7a6f46b4edd0..096317610949 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -94,7 +94,7 @@ union mon_data_bits {
> struct {
> unsigned int rid : 10;
> enum resctrl_event_id evtid : 8;
> - unsigned int domid : 14;
> + u32 domid;
> } u;
> };
>

resctrl currently supports 32bit builds. Fixing this issue* in this way
would first require that resctrl (the architecture independent fs part)
depend on X86_64. Is this a change that everybody will be comfortable with?

(Of course, there are other solutions available to address the issue mentioned
in this patch that do not require depending on X86_64, but I would like
to take this moment to understand the sentiment surrounding continuing support
for 32bit resctrl.)

Thank you.

Reinette

* Please note that this is not an urgent fix but instead a preparatory change
for future Arm support.

2024-03-15 00:05:05

by Peter Newman

[permalink] [raw]
Subject: Re: 32bit resctrl? (was Re: [PATCH v2] fs/resctrl: fix domid loss precision issue)

Hi Rex,

On Thu, Mar 14, 2024 at 8:25 AM Reinette Chatre
<[email protected]> wrote:
>
> +x86 maintainers, Tony, Babu, Peter
>
> Hi Everybody,
>
> On 3/12/2024 12:53 AM, Rex Nie wrote:
> > Below statement from mkdir_mondata_subdir function will loss precision,
> > because it assigns int to 14 bits bitfield.
> > priv.u.domid = d->id;
> >
> > On some platforms(e.g.,x86), the max cache_id is the amount of L3 caches,
> > so it is not in the range of 0x3fff. But some platforms use higher
> > cache_id, e.g., arm uses cache_id as locator for cache MSC. This will
> > cause below issue if cache_id > 0x3fff likes:
> > /sys/fs/resctrl/mon_groups/p1/mon_data/mon_L3_1048564 # cat llc_occupancy

This domain ID seems unreadable in decimal and I'm wondering whether
it's the best value to use to identify the domain. Does this system
have over 1 million L3 domains?

Thanks!
-Peter

2024-03-15 00:52:07

by Reinette Chatre

[permalink] [raw]
Subject: Re: 32bit resctrl? (was Re: [PATCH v2] fs/resctrl: fix domid loss precision issue)



On 3/14/2024 5:04 PM, Peter Newman wrote:
> On Thu, Mar 14, 2024 at 8:25 AM Reinette Chatre
> <[email protected]> wrote:
>>
>> +x86 maintainers, Tony, Babu, Peter
>>
>> Hi Everybody,
>>
>> On 3/12/2024 12:53 AM, Rex Nie wrote:
>>> Below statement from mkdir_mondata_subdir function will loss precision,
>>> because it assigns int to 14 bits bitfield.
>>> priv.u.domid = d->id;
>>>
>>> On some platforms(e.g.,x86), the max cache_id is the amount of L3 caches,
>>> so it is not in the range of 0x3fff. But some platforms use higher
>>> cache_id, e.g., arm uses cache_id as locator for cache MSC. This will
>>> cause below issue if cache_id > 0x3fff likes:
>>> /sys/fs/resctrl/mon_groups/p1/mon_data/mon_L3_1048564 # cat llc_occupancy
>
> This domain ID seems unreadable in decimal and I'm wondering whether
> it's the best value to use to identify the domain. Does this system
> have over 1 million L3 domains?

I propose that Rex includes the information about Arm's cache IDs [1]
in future changelogs.

Reinette


[1] https://lore.kernel.org/lkml/KL1PR0601MB577303C9D0B1247436BB06F8E6242@KL1PR0601MB5773.apcprd06.prod.outlook.com/


2024-03-15 00:56:56

by Rex Nie

[permalink] [raw]
Subject: 答复: 32bit resctrl? (was Re: [PATCH v2] fs /resctrl: fix domid loss precision issue)

HI Peter,
In next link, Maciej Wieczor-Retman and I talked about arm cache id. FYI
https://lore.kernel.org/all/KL1PR0601MB57730994C0501212DF6DDC6EE6242@KL1PR0601MB5773.apcprd06.prod.outlook.com/
BRs
Rex

> -----邮件原件-----
> 发件人: Peter Newman <[email protected]>
> 发送时间: 2024年3月15日 8:05
> 收件人: Rex Nie <[email protected]>
> 抄送: [email protected]; [email protected]; Luck, Tony
> <[email protected]>; Babu Moger <[email protected]>; Borislav
> Petkov <[email protected]>; [email protected]; [email protected];
> [email protected]; Reinette Chatre <[email protected]>
> 主题: Re: 32bit resctrl? (was Re: [PATCH v2] fs/resctrl: fix domid loss precision
> issue)
>
> External Mail: This email originated from OUTSIDE of the organization!
> Do not click links, open attachments or provide ANY information unless you
> recognize the sender and know the content is safe.
>
>
> Hi Rex,
>
> On Thu, Mar 14, 2024 at 8:25 AM Reinette Chatre <[email protected]>
> wrote:
> >
> > +x86 maintainers, Tony, Babu, Peter
> >
> > Hi Everybody,
> >
> > On 3/12/2024 12:53 AM, Rex Nie wrote:
> > > Below statement from mkdir_mondata_subdir function will loss
> > > precision, because it assigns int to 14 bits bitfield.
> > > priv.u.domid = d->id;
> > >
> > > On some platforms(e.g.,x86), the max cache_id is the amount of L3
> > > caches, so it is not in the range of 0x3fff. But some platforms use
> > > higher cache_id, e.g., arm uses cache_id as locator for cache MSC.
> > > This will cause below issue if cache_id > 0x3fff likes:
> > > /sys/fs/resctrl/mon_groups/p1/mon_data/mon_L3_1048564 # cat
> > > llc_occupancy
>
> This domain ID seems unreadable in decimal and I'm wondering whether it's
> the best value to use to identify the domain. Does this system have over 1
> million L3 domains?
>
> Thanks!
> -Peter

2024-03-15 01:21:56

by Rex Nie

[permalink] [raw]
Subject: 答复: 32bit resctrl? (was Re: [PATCH v2] fs /resctrl: fix domid loss precision issue)



> -----邮件原件-----
> 发件人: Reinette Chatre <[email protected]>
> 发送时间: 2024年3月15日 8:52
> 收件人: Peter Newman <[email protected]>; Rex Nie
> <[email protected]>
> 抄送: [email protected]; [email protected]; Luck, Tony
> <[email protected]>; Babu Moger <[email protected]>; Borislav
> Petkov <[email protected]>; [email protected]; [email protected];
> [email protected]
> 主题: Re: 32bit resctrl? (was Re: [PATCH v2] fs/resctrl: fix domid loss precision
> issue)
>
> External Mail: This email originated from OUTSIDE of the organization!
> Do not click links, open attachments or provide ANY information unless you
> recognize the sender and know the content is safe.
>
>
> On 3/14/2024 5:04 PM, Peter Newman wrote:
> > On Thu, Mar 14, 2024 at 8:25 AM Reinette Chatre
> > <[email protected]> wrote:
> >>
> >> +x86 maintainers, Tony, Babu, Peter
> >>
> >> Hi Everybody,
> >>
> >> On 3/12/2024 12:53 AM, Rex Nie wrote:
> >>> Below statement from mkdir_mondata_subdir function will loss
> >>> precision, because it assigns int to 14 bits bitfield.
> >>> priv.u.domid = d->id;
> >>>
> >>> On some platforms(e.g.,x86), the max cache_id is the amount of L3
> >>> caches, so it is not in the range of 0x3fff. But some platforms use
> >>> higher cache_id, e.g., arm uses cache_id as locator for cache MSC.
> >>> This will cause below issue if cache_id > 0x3fff likes:
> >>> /sys/fs/resctrl/mon_groups/p1/mon_data/mon_L3_1048564 # cat
> >>> llc_occupancy
> >
> > This domain ID seems unreadable in decimal and I'm wondering whether
> > it's the best value to use to identify the domain. Does this system
> > have over 1 million L3 domains?
>
> I propose that Rex includes the information about Arm's cache IDs [1] in future
> changelogs.
>
> Reinette
>
OK, I will include the link.

Rex

>
> [1]
> https://lore.kernel.org/lkml/KL1PR0601MB577303C9D0B1247436BB06F8E62
> [email protected]/

2024-03-15 16:17:43

by Moger, Babu

[permalink] [raw]
Subject: Re: 32bit resctrl? (was Re: [PATCH v2] fs/resctrl: fix domid loss precision issue)


On 3/14/2024 10:25 AM, Reinette Chatre wrote:
> +x86 maintainers, Tony, Babu, Peter
>
> Hi Everybody,
>
> On 3/12/2024 12:53 AM, Rex Nie wrote:
>> Below statement from mkdir_mondata_subdir function will loss precision,
>> because it assigns int to 14 bits bitfield.
>> priv.u.domid = d->id;
>>
>> On some platforms(e.g.,x86), the max cache_id is the amount of L3 caches,
>> so it is not in the range of 0x3fff. But some platforms use higher
>> cache_id, e.g., arm uses cache_id as locator for cache MSC. This will
>> cause below issue if cache_id > 0x3fff likes:
>> /sys/fs/resctrl/mon_groups/p1/mon_data/mon_L3_1048564 # cat llc_occupancy
>> cat: read error: No such file or directory
>>
>> This is the call trace when cat llc_occupancy:
>> rdtgroup_mondata_show()
>> domid = md.u.domid
>> d = resctrl_arch_find_domain(r, domid)
>>
>> d is null here because of lossing precision
>>
>> Signed-off-by: Rex Nie <[email protected]>
>> ---
>> fs/resctrl/internal.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>> index 7a6f46b4edd0..096317610949 100644
>> --- a/fs/resctrl/internal.h
>> +++ b/fs/resctrl/internal.h
>> @@ -94,7 +94,7 @@ union mon_data_bits {
>> struct {
>> unsigned int rid : 10;
>> enum resctrl_event_id evtid : 8;
>> - unsigned int domid : 14;
>> + u32 domid;
>> } u;
>> };
>>
> resctrl currently supports 32bit builds. Fixing this issue* in this way

I have never bothered about 32bit builds.   Is Intel still testing 32bit
builds?


> would first require that resctrl (the architecture independent fs part)
> depend on X86_64. Is this a change that everybody will be comfortable with?
>
> (Of course, there are other solutions available to address the issue mentioned
> in this patch that do not require depending on X86_64, but I would like
> to take this moment to understand the sentiment surrounding continuing support
> for 32bit resctrl.)

I am thinking we have bigger problem here.

The structure pointer "union mon_data_bits priv;" is created in stack
and passed to create mondata directory. We are reading it later again in
rdtgroup_mondata_show.

How is this pointer valid again?  Shouldn't we use static pointer or
allocate memory for the pointer?

thanks

Babu



>
> Thank you.
>
> Reinette
>
> * Please note that this is not an urgent fix but instead a preparatory change
> for future Arm support.
>

2024-03-15 17:06:57

by Peter Newman

[permalink] [raw]
Subject: Re: 32bit resctrl? (was Re: [PATCH v2] fs/resctrl: fix domid loss precision issue)

On Fri, Mar 15, 2024 at 9:17 AM Moger, Babu <[email protected]> wrote:
>
>
> On 3/14/2024 10:25 AM, Reinette Chatre wrote:
> > +x86 maintainers, Tony, Babu, Peter
> >
> > Hi Everybody,
> >
> > On 3/12/2024 12:53 AM, Rex Nie wrote:
> >> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> >> index 7a6f46b4edd0..096317610949 100644
> >> --- a/fs/resctrl/internal.h
> >> +++ b/fs/resctrl/internal.h
> >> @@ -94,7 +94,7 @@ union mon_data_bits {
> >> struct {
> >> unsigned int rid : 10;
> >> enum resctrl_event_id evtid : 8;
> >> - unsigned int domid : 14;
> >> + u32 domid;
> >> } u;
> >> };
> >>
> > resctrl currently supports 32bit builds. Fixing this issue* in this way
>
> I have never bothered about 32bit builds. Is Intel still testing 32bit
> builds?

I can confirm we don't have any 32-bit builds.


> The structure pointer "union mon_data_bits priv;" is created in stack
> and passed to create mondata directory. We are reading it later again in
> rdtgroup_mondata_show.
>
> How is this pointer valid again? Shouldn't we use static pointer or
> allocate memory for the pointer?

The union is copied by value into the pointer-sized field, hence the
need for pointers to be large enough to hold this value.

-Peter

2024-03-15 18:00:34

by James Morse

[permalink] [raw]
Subject: Re: 32bit resctrl? (was Re: [PATCH v2] fs/resctrl: fix domid loss precision issue)

Hi guys,

On 15/03/2024 16:56, Peter Newman wrote:
> On Fri, Mar 15, 2024 at 9:17 AM Moger, Babu <[email protected]> wrote:
>> On 3/14/2024 10:25 AM, Reinette Chatre wrote:
>>> +x86 maintainers, Tony, Babu, Peter
>>>
>>> Hi Everybody,
>>>
>>> On 3/12/2024 12:53 AM, Rex Nie wrote:
>>>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>>>> index 7a6f46b4edd0..096317610949 100644
>>>> --- a/fs/resctrl/internal.h
>>>> +++ b/fs/resctrl/internal.h
>>>> @@ -94,7 +94,7 @@ union mon_data_bits {
>>>> struct {
>>>> unsigned int rid : 10;
>>>> enum resctrl_event_id evtid : 8;
>>>> - unsigned int domid : 14;
>>>> + u32 domid;
>>>> } u;
>>>> };
>>>>
>>> resctrl currently supports 32bit builds. Fixing this issue* in this way
>>
>> I have never bothered about 32bit builds. Is Intel still testing 32bit
>> builds?
>
> I can confirm we don't have any 32-bit builds.
>
>
>> The structure pointer "union mon_data_bits priv;" is created in stack
>> and passed to create mondata directory. We are reading it later again in
>> rdtgroup_mondata_show.
>>
>> How is this pointer valid again? Shouldn't we use static pointer or
>> allocate memory for the pointer?
>
> The union is copied by value into the pointer-sized field, hence the
> need for pointers to be large enough to hold this value.

Couldn't we allocate the memory for a structure to hold the values we want, then use the
pointer as a pointer?

I suspect whether 32bit builds are important depends on if anyone is using it, which we
can't really know. Debian has 32bit builds, and while its unlikely anyone runs that on a
server part, whatever an "Intel Celeron J3455" is supports RDT too. I'd be keen not to
break it!


As for these eye-sore-ids ... I'm in two minds as to whether we should clean them up in
the kernel. It would be fairly straightforward to scan the PPTT to find them all and map
them to 0,1,2,. But this loses the values provided by the vendor.
x86 and arm64:device-tree systems generate them, so its not clear that user-space needs a
value provided by the vendor here.


Thanks,

James

2024-03-15 22:42:57

by Reinette Chatre

[permalink] [raw]
Subject: Re: 32bit resctrl? (was Re: [PATCH v2] fs/resctrl: fix domid loss precision issue)

Hi James,

On 3/15/2024 11:00 AM, James Morse wrote:
> On 15/03/2024 16:56, Peter Newman wrote:
>> On Fri, Mar 15, 2024 at 9:17 AM Moger, Babu <[email protected]> wrote:
>>> On 3/14/2024 10:25 AM, Reinette Chatre wrote:
>>>> +x86 maintainers, Tony, Babu, Peter
>>>>
>>>> Hi Everybody,
>>>>
>>>> On 3/12/2024 12:53 AM, Rex Nie wrote:
>>>>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>>>>> index 7a6f46b4edd0..096317610949 100644
>>>>> --- a/fs/resctrl/internal.h
>>>>> +++ b/fs/resctrl/internal.h
>>>>> @@ -94,7 +94,7 @@ union mon_data_bits {
>>>>> struct {
>>>>> unsigned int rid : 10;
>>>>> enum resctrl_event_id evtid : 8;
>>>>> - unsigned int domid : 14;
>>>>> + u32 domid;
>>>>> } u;
>>>>> };
>>>>>
>>>> resctrl currently supports 32bit builds. Fixing this issue* in this way
>>>
>>> I have never bothered about 32bit builds. Is Intel still testing 32bit
>>> builds?
>>
>> I can confirm we don't have any 32-bit builds.
>>
>>
>>> The structure pointer "union mon_data_bits priv;" is created in stack
>>> and passed to create mondata directory. We are reading it later again in
>>> rdtgroup_mondata_show.
>>>
>>> How is this pointer valid again? Shouldn't we use static pointer or
>>> allocate memory for the pointer?
>>
>> The union is copied by value into the pointer-sized field, hence the
>> need for pointers to be large enough to hold this value.
>
> Couldn't we allocate the memory for a structure to hold the values we want, then use the
> pointer as a pointer?

Yes, there are a couple of different ways to solve this that remains friendly
to 32-bit. My goal with this thread was to gauge the sentiment surrounding
continuing support for 32-bit resctrl.

> I suspect whether 32bit builds are important depends on if anyone is using it, which we
> can't really know. Debian has 32bit builds, and while its unlikely anyone runs that on a
> server part, whatever an "Intel Celeron J3455" is supports RDT too. I'd be keen not to
> break it!

You are right. We can't really know. My question did not yet receive a request to
keep 32-bit support so this will remain uncertain but I am starting to get a sense that
folks may not be using these builds. I do not think that the issue that Rex reported
warrants such a direction change so I am ok to delay considering moving to 64-bit only
and try to keep 32-bit in mind in future work. I have not been testing 32-bit builds though.

(btw ... "Intel Celeron J3455" details can be seen at [1]. It is a great (64-bit)
platform that I worked with for a while and it supports cache pseudo-locking well.)

> As for these eye-sore-ids ... I'm in two minds as to whether we should clean them up in
> the kernel. It would be fairly straightforward to scan the PPTT to find them all and map
> them to 0,1,2,. But this loses the values provided by the vendor.
> x86 and arm64:device-tree systems generate them, so its not clear that user-space needs a
> value provided by the vendor here.

Another alternative: if I counted right it seems that Arm would need 24bits for these
IDs? That still leaves 8 bits for the resource ID (current max 4) and event ID (current max 3).
How many resources and events are on the horizon for Arm?

Reinette


[1] https://ark.intel.com/content/www/us/en/ark/products/95594/intel-celeron-processor-j3455-2m-cache-up-to-2-3-ghz.html

2024-03-15 23:32:41

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 32bit resctrl? (was Re: [PATCH v2] fs/resctrl: fix domid loss precision issue)

On Thu, Mar 14 2024 at 08:25, Reinette Chatre wrote:
>> On some platforms(e.g.,x86), the max cache_id is the amount of L3 caches,
>> so it is not in the range of 0x3fff. But some platforms use higher
>> cache_id, e.g., arm uses cache_id as locator for cache MSC. This will
>> cause below issue if cache_id > 0x3fff likes:
>> /sys/fs/resctrl/mon_groups/p1/mon_data/mon_L3_1048564 # cat llc_occupancy
>> cat: read error: No such file or directory
>>
>> This is the call trace when cat llc_occupancy:
>> rdtgroup_mondata_show()
>> domid = md.u.domid
>> d = resctrl_arch_find_domain(r, domid)
>>
>> d is null here because of lossing precision
>>
>> Signed-off-by: Rex Nie <[email protected]>
>> ---
>> fs/resctrl/internal.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>> index 7a6f46b4edd0..096317610949 100644
>> --- a/fs/resctrl/internal.h
>> +++ b/fs/resctrl/internal.h
>> @@ -94,7 +94,7 @@ union mon_data_bits {
>> struct {
>> unsigned int rid : 10;
>> enum resctrl_event_id evtid : 8;
>> - unsigned int domid : 14;
>> + u32 domid;
>> } u;
>> };
>>
>
> resctrl currently supports 32bit builds. Fixing this issue* in this way
> would first require that resctrl (the architecture independent fs part)
> depend on X86_64. Is this a change that everybody will be comfortable
> with?

Why? Making mon_data_bits::u larger in the way it has been done does not
have any dependency on 32 or 64 builds unless I'm missing something.

> (Of course, there are other solutions available to address the issue mentioned
> in this patch that do not require depending on X86_64, but I would like
> to take this moment to understand the sentiment surrounding continuing support
> for 32bit resctrl.)

Caring about 32biit resctrl on x86 is a pointless exercise.

Thanks,

tglx

2024-03-16 00:08:29

by Luck, Tony

[permalink] [raw]
Subject: RE: 32bit resctrl? (was Re: [PATCH v2] fs/resctrl: fix domid loss precision issue)

> Why? Making mon_data_bits::u larger in the way it has been done does not
> have any dependency on 32 or 64 builds unless I'm missing something.

That union is copied into the kernfs_node "priv" pointer field. So it has
to fit into whatever size the system is using as a pointer:

See:
mkdir_mondata_subdir()
->mon_addfile()

> Caring about 32biit resctrl on x86 is a pointless exercise.

Agree 100%

-Tony

2024-03-16 00:20:37

by Reinette Chatre

[permalink] [raw]
Subject: Re: 32bit resctrl? (was Re: [PATCH v2] fs/resctrl: fix domid loss precision issue)

Hi Thomas,

On 3/15/2024 4:32 PM, Thomas Gleixner wrote:
> On Thu, Mar 14 2024 at 08:25, Reinette Chatre wrote:
>>> On some platforms(e.g.,x86), the max cache_id is the amount of L3 caches,
>>> so it is not in the range of 0x3fff. But some platforms use higher
>>> cache_id, e.g., arm uses cache_id as locator for cache MSC. This will
>>> cause below issue if cache_id > 0x3fff likes:
>>> /sys/fs/resctrl/mon_groups/p1/mon_data/mon_L3_1048564 # cat llc_occupancy
>>> cat: read error: No such file or directory
>>>
>>> This is the call trace when cat llc_occupancy:
>>> rdtgroup_mondata_show()
>>> domid = md.u.domid
>>> d = resctrl_arch_find_domain(r, domid)
>>>
>>> d is null here because of lossing precision
>>>
>>> Signed-off-by: Rex Nie <[email protected]>
>>> ---
>>> fs/resctrl/internal.h | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>>> index 7a6f46b4edd0..096317610949 100644
>>> --- a/fs/resctrl/internal.h
>>> +++ b/fs/resctrl/internal.h
>>> @@ -94,7 +94,7 @@ union mon_data_bits {
>>> struct {
>>> unsigned int rid : 10;
>>> enum resctrl_event_id evtid : 8;
>>> - unsigned int domid : 14;
>>> + u32 domid;
>>> } u;
>>> };
>>>
>>
>> resctrl currently supports 32bit builds. Fixing this issue* in this way
>> would first require that resctrl (the architecture independent fs part)
>> depend on X86_64. Is this a change that everybody will be comfortable
>> with?
>
> Why? Making mon_data_bits::u larger in the way it has been done does not
> have any dependency on 32 or 64 builds unless I'm missing something.

I should have expanded the diff. The expanded view of current code below
gives more insight into how a pointer is used to store data:

union mon_data_bits {
void *priv;
struct {
unsigned int rid : 10;
enum resctrl_event_id evtid : 8;
unsigned int domid : 14;
} u;
}

>
>> (Of course, there are other solutions available to address the issue mentioned
>> in this patch that do not require depending on X86_64, but I would like
>> to take this moment to understand the sentiment surrounding continuing support
>> for 32bit resctrl.)
>
> Caring about 32biit resctrl on x86 is a pointless exercise.
>

Thank you Thomas. This code is what will soon be moved into the architecture
agnostic "resctrl filesystem". Are there expectations from more generic
interfaces like this regarding 32-bit/64-bit? "resctrl filesystem" is on a
path to support more architectures (x86, Arm, and RISC-V) and I am not familiar
with the architectures and platforms to know what the impact of such a change
would be nor what existing usages there may be for 32-bit builds.

Reinette



2024-03-18 18:28:08

by James Morse

[permalink] [raw]
Subject: Re: 32bit resctrl? (was Re: [PATCH v2] fs/resctrl: fix domid loss precision issue)

Hi Reinette,

On 15/03/2024 22:42, Reinette Chatre wrote:
> On 3/15/2024 11:00 AM, James Morse wrote:
>> On 15/03/2024 16:56, Peter Newman wrote:
>>> On Fri, Mar 15, 2024 at 9:17 AM Moger, Babu <[email protected]> wrote:
>>>> On 3/14/2024 10:25 AM, Reinette Chatre wrote:
>>>>> On 3/12/2024 12:53 AM, Rex Nie wrote:
>>>>>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>>>>>> index 7a6f46b4edd0..096317610949 100644
>>>>>> --- a/fs/resctrl/internal.h
>>>>>> +++ b/fs/resctrl/internal.h
>>>>>> @@ -94,7 +94,7 @@ union mon_data_bits {
>>>>>> struct {
>>>>>> unsigned int rid : 10;
>>>>>> enum resctrl_event_id evtid : 8;
>>>>>> - unsigned int domid : 14;
>>>>>> + u32 domid;
>>>>>> } u;
>>>>>> };
>>>>>>
>>>>> resctrl currently supports 32bit builds. Fixing this issue* in this way
>>>>
>>>> I have never bothered about 32bit builds. Is Intel still testing 32bit
>>>> builds?
>>>
>>> I can confirm we don't have any 32-bit builds.
>>>
>>>
>>>> The structure pointer "union mon_data_bits priv;" is created in stack
>>>> and passed to create mondata directory. We are reading it later again in
>>>> rdtgroup_mondata_show.
>>>>
>>>> How is this pointer valid again? Shouldn't we use static pointer or
>>>> allocate memory for the pointer?
>>>
>>> The union is copied by value into the pointer-sized field, hence the
>>> need for pointers to be large enough to hold this value.
>>
>> Couldn't we allocate the memory for a structure to hold the values we want, then use the
>> pointer as a pointer?
>
> Yes, there are a couple of different ways to solve this that remains friendly
> to 32-bit. My goal with this thread was to gauge the sentiment surrounding
> continuing support for 32-bit resctrl.
>
>> I suspect whether 32bit builds are important depends on if anyone is using it, which we
>> can't really know. Debian has 32bit builds, and while its unlikely anyone runs that on a
>> server part, whatever an "Intel Celeron J3455" is supports RDT too. I'd be keen not to
>> break it!
>
> You are right. We can't really know. My question did not yet receive a request to
> keep 32-bit support so this will remain uncertain but I am starting to get a sense that
> folks may not be using these builds. I do not think that the issue that Rex reported
> warrants such a direction change so I am ok to delay considering moving to 64-bit only
> and try to keep 32-bit in mind in future work. I have not been testing 32-bit builds though.
>
> (btw ... "Intel Celeron J3455" details can be seen at [1]. It is a great (64-bit)
> platform that I worked with for a while and it supports cache pseudo-locking well.)
>
>> As for these eye-sore-ids ... I'm in two minds as to whether we should clean them up in
>> the kernel. It would be fairly straightforward to scan the PPTT to find them all and map
>> them to 0,1,2,. But this loses the values provided by the vendor.
>> x86 and arm64:device-tree systems generate them, so its not clear that user-space needs a
>> value provided by the vendor here.


> Another alternative: if I counted right it seems that Arm would need 24bits for these
> IDs? That still leaves 8 bits for the resource ID (current max 4) and event ID (current max 3).
> How many resources and events are on the horizon for Arm?

No where near that many! - but its impossible for me to know. MPAM is a bag of bits, its
up to Arm's partners what they build with it. MPAM does have 8 bit counter id,s but there
are only two defined so far. The other fields depend on what people build.

The problem here is the folk who generate the ACPI tables have seen a 32bit field, and
generated a value based on the topology so they know its always unique - instead of taking
a description of the two or three caches that need an ID.

I'll have a go at compressing them into a smaller range in the kernel. As no other
platform preserves a hardware ID here, I don't think its a major loss to re-create these.
It's certainly going to be less ugly to user-space!


Thanks,

James