2024-03-11 06:48:55

by Rex Nie

[permalink] [raw]
Subject: [PATCH] fs/resctrl: fix domid loss precision issue

Below statement from mkdir_mondata_subdir function will loss precision,
because it assigns int to 14 bits bitfield.
priv.u.domid = d->id;

This will cause below issue if cache_id > 0x3fff likes:
/sys/fs/resctrl/mon_groups/p1/mon_data/mon_L3_1048564 # cat llc_occupancy
cat: read error: No such file or directory

This is the call trace when cat llc_occupancy:
rdtgroup_mondata_show()
domid = md.u.domid
d = resctrl_arch_find_domain(r, domid)

d is null here because of lossing precision

Signed-off-by: Rex Nie <[email protected]>
Signed-off-by: Liming Wu <[email protected]>
---
fs/resctrl/internal.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 7a6f46b4edd0..096317610949 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -94,7 +94,7 @@ union mon_data_bits {
struct {
unsigned int rid : 10;
enum resctrl_event_id evtid : 8;
- unsigned int domid : 14;
+ u32 domid;
} u;
};

--
2.34.1



2024-03-11 08:24:30

by Maciej Wieczor-Retman

[permalink] [raw]
Subject: Re: [PATCH] fs/resctrl: fix domid loss precision issue

Hello,

On 2024-03-11 at 14:48:22 +0800, Rex Nie wrote:
>Below statement from mkdir_mondata_subdir function will loss precision,
>because it assigns int to 14 bits bitfield.
> priv.u.domid = d->id;
>
>This will cause below issue if cache_id > 0x3fff likes:

Is there some reason for cache_id ever being this high?

I thought the max for cache_id was the amount of L3 caches on a system. And I
only observed it going up to 3 on some server platforms. So not nearly in the
range of 0x3fff or 16k.

>/sys/fs/resctrl/mon_groups/p1/mon_data/mon_L3_1048564 # cat llc_occupancy

How did you get this file to appear? Could you maybe show how your mon_data
directory looks like?

>cat: read error: No such file or directory
>
>This is the call trace when cat llc_occupancy:
>rdtgroup_mondata_show()
> domid = md.u.domid
> d = resctrl_arch_find_domain(r, domid)
>
>d is null here because of lossing precision
>
>Signed-off-by: Rex Nie <[email protected]>
>Signed-off-by: Liming Wu <[email protected]>
>---
> fs/resctrl/internal.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>index 7a6f46b4edd0..096317610949 100644
>--- a/fs/resctrl/internal.h
>+++ b/fs/resctrl/internal.h
>@@ -94,7 +94,7 @@ union mon_data_bits {
> struct {
> unsigned int rid : 10;
> enum resctrl_event_id evtid : 8;
>- unsigned int domid : 14;
>+ u32 domid;
> } u;
> };
>
>--
>2.34.1
>

--
Kind regards
Maciej Wiecz?r-Retman

2024-03-11 09:38:47

by Rex Nie

[permalink] [raw]
Subject: 答复: [PATCH] fs/resctrl: fix domid loss prec ision issue

Hello,
Please kindly check my inline reply. Thanks.
Best regards
Rex Nie

> -----?ʼ?ԭ??-----
> ??????: Maciej Wieczor-Retman <[email protected]>
> ????ʱ??: 2024??3??11?? 16:24
> ?ռ???: Rex Nie <[email protected]>
> ????: [email protected]; [email protected];
> [email protected]; [email protected]; Liming Wu
> <[email protected]>
> ????: Re: [PATCH] fs/resctrl: fix domid loss precision issue
>
> External Mail: This email originated from OUTSIDE of the organization!
> Do not click links, open attachments or provide ANY information unless you
> recognize the sender and know the content is safe.
>
>
> Hello,
>
> On 2024-03-11 at 14:48:22 +0800, Rex Nie wrote:
> >Below statement from mkdir_mondata_subdir function will loss precision,
> >because it assigns int to 14 bits bitfield.
> > priv.u.domid = d->id;
> >
> >This will cause below issue if cache_id > 0x3fff likes:
>
> Is there some reason for cache_id ever being this high?
>
> I thought the max for cache_id was the amount of L3 caches on a system. And I
> only observed it going up to 3 on some server platforms. So not nearly in the
> range of 0x3fff or 16k.
>
It is exactly as you said on X86 platforms, but cache_Id on Arm platform is different.
According to ACPI for mpam, cache id is used as locator for cache MSC. Reference to RD_PPTT_CACHE_ID definition from edk2-platforms:
#define RD_PPTT_CACHE_ID(PackageId, ClusterId, CoreId, CacheType) \
( \
(((PackageId) & 0xF) << 20) | (((ClusterId) & 0xFF) << 12) | \
(((CoreId) & 0xFF) << 4) | ((CacheType) & 0xF) \
)
So it may be > 0x3fff on Arm platform.

Reference RD_PPTT_CACHE_ID from edk2-platforms: https://github.com/tianocore/edk2-platforms/blob/master/Platform/ARM/SgiPkg/Include/SgiAcpiHeader.h#L202

> >/sys/fs/resctrl/mon_groups/p1/mon_data/mon_L3_1048564 # cat
> >llc_occupancy
>
> How did you get this file to appear? Could you maybe show how your
> mon_data directory looks like?
>
I found this issue on Arm FVP N1 platform and my N2 platform.

Below is the steps on Arm FVP N1:
mount -t resctrl resctrl / /sys/fs/resctrl
cd /sys/fs/resctrl/mon_data

/sys/fs/resctrl/mon_data # ls -l
total 0
dr-xr-xr-x 2 0 0 0 Mar 11 09:24 mon_L3_1048564

cd /sys/fs/resctrl/mon_data # cd mon_L3_1048564
/sys/fs/resctrl/mon_data/mon_L3_1048564 # cat llc_occupancy
cat: read error: No such file or directory

Arm FVP MPAM: https://neoverse-reference-design.docs.arm.com/en/latest/mpam/mpam-resctrl.html#memory-system-resource-partitioning-and-monitoring-mpam

> >cat: read error: No such file or directory
> >
> >This is the call trace when cat llc_occupancy:
> >rdtgroup_mondata_show()
> > domid = md.u.domid
> > d = resctrl_arch_find_domain(r, domid)
> >
> >d is null here because of lossing precision
> >
> >Signed-off-by: Rex Nie <[email protected]>
> >Signed-off-by: Liming Wu <[email protected]>
> >---
> > fs/resctrl/internal.h | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> >diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h index
> >7a6f46b4edd0..096317610949 100644
> >--- a/fs/resctrl/internal.h
> >+++ b/fs/resctrl/internal.h
> >@@ -94,7 +94,7 @@ union mon_data_bits {
> > struct {
> > unsigned int rid : 10;
> > enum resctrl_event_id evtid : 8;
> >- unsigned int domid : 14;
> >+ u32 domid;
> > } u;
> > };
> >
> >--
> >2.34.1
> >
>
> --
> Kind regards
> Maciej Wiecz??r-Retman

2024-03-11 11:50:35

by Maciej Wieczor-Retman

[permalink] [raw]
Subject: Re: 答复 : [PATCH] fs/resctrl: fix domid loss precision issue

Thanks for the reply,

On 2024-03-11 at 09:37:37 +0000, Rex Nie wrote:
>Hello,
> Please kindly check my inline reply. Thanks.
>Best regards
>Rex Nie
>
>> >This will cause below issue if cache_id > 0x3fff likes:
>>
>> Is there some reason for cache_id ever being this high?
>>
>> I thought the max for cache_id was the amount of L3 caches on a system. And I
>> only observed it going up to 3 on some server platforms. So not nearly in the
>> range of 0x3fff or 16k.
>>
>It is exactly as you said on X86 platforms, but cache_Id on Arm platform is different.
>According to ACPI for mpam, cache id is used as locator for cache MSC. Reference to RD_PPTT_CACHE_ID definition from edk2-platforms:
>#define RD_PPTT_CACHE_ID(PackageId, ClusterId, CoreId, CacheType) \
> ( \
> (((PackageId) & 0xF) << 20) | (((ClusterId) & 0xFF) << 12) | \
> (((CoreId) & 0xFF) << 4) | ((CacheType) & 0xF) \
> )
>So it may be > 0x3fff on Arm platform.
>
>Reference RD_PPTT_CACHE_ID from edk2-platforms: https://github.com/tianocore/edk2-platforms/blob/master/Platform/ARM/SgiPkg/Include/SgiAcpiHeader.h#L202

and thanks for clearing it up for me! I browsed some MPAM patches but didn't
notice cache_id was used differently on ARM.

>
>> >/sys/fs/resctrl/mon_groups/p1/mon_data/mon_L3_1048564 # cat
>> >llc_occupancy
>>
>> How did you get this file to appear? Could you maybe show how your
>> mon_data directory looks like?
>>
>I found this issue on Arm FVP N1 platform and my N2 platform.
>
>Below is the steps on Arm FVP N1:
>mount -t resctrl resctrl / /sys/fs/resctrl
>cd /sys/fs/resctrl/mon_data
>
>/sys/fs/resctrl/mon_data # ls -l
>total 0
>dr-xr-xr-x 2 0 0 0 Mar 11 09:24 mon_L3_1048564
>
>cd /sys/fs/resctrl/mon_data # cd mon_L3_1048564
>/sys/fs/resctrl/mon_data/mon_L3_1048564 # cat llc_occupancy
>cat: read error: No such file or directory
>
>Arm FVP MPAM: https://neoverse-reference-design.docs.arm.com/en/latest/mpam/mpam-resctrl.html#memory-system-resource-partitioning-and-monitoring-mpam
>

--
Kind regards
Maciej Wiecz?r-Retman

2024-03-11 12:15:42

by Rex Nie

[permalink] [raw]
Subject: 答复: 答复: [PATCH] fs/resctrl: fix domid los s precision issue

Hello,
Reply as below.
Best regards
Rex

> -----?ʼ?ԭ??-----
> ??????: Maciej Wieczor-Retman <[email protected]>
> ????ʱ??: 2024??3??11?? 19:50
> ?ռ???: Rex Nie <[email protected]>
> ????: [email protected]; [email protected];
> [email protected]; [email protected]; Liming Wu
> <[email protected]>
> ????: Re: ????: [PATCH] fs/resctrl: fix domid loss precision issue
>
> External Mail: This email originated from OUTSIDE of the organization!
> Do not click links, open attachments or provide ANY information unless you
> recognize the sender and know the content is safe.
>
>
> Thanks for the reply,
>
> On 2024-03-11 at 09:37:37 +0000, Rex Nie wrote:
> >Hello,
> > Please kindly check my inline reply. Thanks.
> >Best regards
> >Rex Nie
> >
> >> >This will cause below issue if cache_id > 0x3fff likes:
> >>
> >> Is there some reason for cache_id ever being this high?
> >>
> >> I thought the max for cache_id was the amount of L3 caches on a
> >> system. And I only observed it going up to 3 on some server
> >> platforms. So not nearly in the range of 0x3fff or 16k.
> >>
> >It is exactly as you said on X86 platforms, but cache_Id on Arm platform is
> different.
> >According to ACPI for mpam, cache id is used as locator for cache MSC.
> Reference to RD_PPTT_CACHE_ID definition from edk2-platforms:
> >#define RD_PPTT_CACHE_ID(PackageId, ClusterId, CoreId, CacheType)
> \
> >
> ( \
> > (((PackageId) & 0xF) << 20) | (((ClusterId) & 0xFF) << 12) |
> \
> > (((CoreId) & 0xFF) << 4) | ((CacheType) & 0xF)
> \
> > )
> >So it may be > 0x3fff on Arm platform.
> >
> >Reference RD_PPTT_CACHE_ID from edk2-platforms:
> >https://github.com/tianocore/edk2-platforms/blob/master/Platform/ARM/Sg
> >iPkg/Include/SgiAcpiHeader.h#L202
>
> and thanks for clearing it up for me! I browsed some MPAM patches but didn't
> notice cache_id was used differently on ARM.
>
Yes, the mpam driver from kernel use the cache_id as component id, domid directly reported from mpam acpi table(struct acpi_table_mpam_msc_res->locator1),
so the use of cache_id in mpam driver is NO different, but the definition of cache_id, or how to generate cache_id from firmware side(UEFI) is different.

I take 2 commits from edk2-platforms of ARM FVP for your reference.
https://git.gitlab.arm.com/arm-reference-solutions/edk2-platforms/-/commit/d9f3290b0ba63dcc4b124017f6dede5afb681cbb
https://git.gitlab.arm.com/arm-reference-solutions/edk2-platforms/-/commit/08fe90e90a84c7ae074ddcfcfa58e73ea7c03a49


> >
> >> >/sys/fs/resctrl/mon_groups/p1/mon_data/mon_L3_1048564 # cat
> >> >llc_occupancy
> >>
> >> How did you get this file to appear? Could you maybe show how your
> >> mon_data directory looks like?
> >>
> >I found this issue on Arm FVP N1 platform and my N2 platform.
> >
> >Below is the steps on Arm FVP N1:
> >mount -t resctrl resctrl / /sys/fs/resctrl cd /sys/fs/resctrl/mon_data
> >
> >/sys/fs/resctrl/mon_data # ls -l
> >total 0
> >dr-xr-xr-x 2 0 0 0 Mar 11 09:24
> mon_L3_1048564
> >
> >cd /sys/fs/resctrl/mon_data # cd mon_L3_1048564
> >/sys/fs/resctrl/mon_data/mon_L3_1048564 # cat llc_occupancy
> >cat: read error: No such file or directory
> >
> >Arm FVP MPAM:
> >https://neoverse-reference-design.docs.arm.com/en/latest/mpam/mpam-res
> c
> >trl.html#memory-system-resource-partitioning-and-monitoring-mpam
> >
>
> --
> Kind regards
> Maciej Wiecz??r-Retman

2024-03-11 13:26:01

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: 答复: [PATCH] fs/resctrl: fix domid loss precision issue

On Mon, 11 Mar 2024, Maciej Wieczor-Retman wrote:

> Thanks for the reply,
>
> On 2024-03-11 at 09:37:37 +0000, Rex Nie wrote:
> >Hello,
> > Please kindly check my inline reply. Thanks.
> >Best regards
> >Rex Nie
> >
> >> >This will cause below issue if cache_id > 0x3fff likes:
> >>
> >> Is there some reason for cache_id ever being this high?
> >>
> >> I thought the max for cache_id was the amount of L3 caches on a system. And I
> >> only observed it going up to 3 on some server platforms. So not nearly in the
> >> range of 0x3fff or 16k.
> >>
> >It is exactly as you said on X86 platforms, but cache_Id on Arm platform is different.
> >According to ACPI for mpam, cache id is used as locator for cache MSC. Reference to RD_PPTT_CACHE_ID definition from edk2-platforms:
> >#define RD_PPTT_CACHE_ID(PackageId, ClusterId, CoreId, CacheType) \
> > ( \
> > (((PackageId) & 0xF) << 20) | (((ClusterId) & 0xFF) << 12) | \
> > (((CoreId) & 0xFF) << 4) | ((CacheType) & 0xF) \
> > )
> >So it may be > 0x3fff on Arm platform.

Hi Rex,

Please also put that kind of knowledge into the commit message upfront. No
need to be as verbose as you're here (with code quotes, etc.) but stating
that some platforms use higher IDs (e.g., Arm) would be pretty useful in
answering the question why you're doing this change (which is one of the
key points of describing your change).

--
i.

> >Reference RD_PPTT_CACHE_ID from edk2-platforms: https://github.com/tianocore/edk2-platforms/blob/master/Platform/ARM/SgiPkg/Include/SgiAcpiHeader.h#L202
>
> and thanks for clearing it up for me! I browsed some MPAM patches but didn't
> notice cache_id was used differently on ARM.
>
> >
> >> >/sys/fs/resctrl/mon_groups/p1/mon_data/mon_L3_1048564 # cat
> >> >llc_occupancy
> >>
> >> How did you get this file to appear? Could you maybe show how your
> >> mon_data directory looks like?
> >>
> >I found this issue on Arm FVP N1 platform and my N2 platform.
> >
> >Below is the steps on Arm FVP N1:
> >mount -t resctrl resctrl / /sys/fs/resctrl
> >cd /sys/fs/resctrl/mon_data
> >
> >/sys/fs/resctrl/mon_data # ls -l
> >total 0
> >dr-xr-xr-x 2 0 0 0 Mar 11 09:24 mon_L3_1048564
> >
> >cd /sys/fs/resctrl/mon_data # cd mon_L3_1048564
> >/sys/fs/resctrl/mon_data/mon_L3_1048564 # cat llc_occupancy
> >cat: read error: No such file or directory
> >
> >Arm FVP MPAM: https://neoverse-reference-design.docs.arm.com/en/latest/mpam/mpam-resctrl.html#memory-system-resource-partitioning-and-monitoring-mpam
> >
>
>