2024-02-21 09:21:33

by Haifeng Xu

[permalink] [raw]
Subject: [PATCH v2 0/2] Track llc_occpuancy of RMIDs in limbo list

After removing a monitor group, its RMID may not be freed immediately
unless its llc_occupancy is less than the re-allocation threshold. If
turning up the threshold, the RMID can be reused. In order to know how
much the threshold should be, it's necessary to acquire the llc_occupancy.

The patch series provides a new tracepoint to track the llc_occupancy.

Changes since v1:
- Rename pseudo_lock_event.h instead of creating a new header file.
- Modify names used in the new tracepoint
- Update changelog

Haifeng Xu (2):
x86/resctrl: Rename pseudo_lock_event.h to trace.h
x86/resctrl: Add tracepoint for llc_occupancy tracking

arch/x86/kernel/cpu/resctrl/monitor.c | 2 ++
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
.../resctrl/{pseudo_lock_event.h => trace.h} | 21 +++++++++++++++----
3 files changed, 20 insertions(+), 5 deletions(-)
rename arch/x86/kernel/cpu/resctrl/{pseudo_lock_event.h => trace.h} (64%)

--
2.25.1



2024-02-21 09:21:51

by Haifeng Xu

[permalink] [raw]
Subject: [PATCH v2 1/2] x86/resctrl: Rename pseudo_lock_event.h to trace.h

Now only pseudo-lock part uses tracepoints to do event tracking, but
other parts of resctrl may need new tracepoints. It is unnecessary to
create separate header files and define CREATE_TRACE_POINTS in different
c files which fragments the resctrl tracing.

Therefore, the new tracepoints should be placed in the same header file,
and the header file needs a more generic name.

No functional change.

Signed-off-by: Haifeng Xu <[email protected]>
Suggested-by: Reinette Chatre <[email protected]>
---
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
.../kernel/cpu/resctrl/{pseudo_lock_event.h => trace.h} | 8 ++++----
2 files changed, 5 insertions(+), 5 deletions(-)
rename arch/x86/kernel/cpu/resctrl/{pseudo_lock_event.h => trace.h} (86%)

diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 8f559eeae08e..e7bcf8287312 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -31,7 +31,7 @@
#include "internal.h"

#define CREATE_TRACE_POINTS
-#include "pseudo_lock_event.h"
+#include "trace.h"

/*
* The bits needed to disable hardware prefetching varies based on the
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h b/arch/x86/kernel/cpu/resctrl/trace.h
similarity index 86%
rename from arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h
rename to arch/x86/kernel/cpu/resctrl/trace.h
index 428ebbd4270b..495fb90c8572 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h
+++ b/arch/x86/kernel/cpu/resctrl/trace.h
@@ -2,8 +2,8 @@
#undef TRACE_SYSTEM
#define TRACE_SYSTEM resctrl

-#if !defined(_TRACE_PSEUDO_LOCK_H) || defined(TRACE_HEADER_MULTI_READ)
-#define _TRACE_PSEUDO_LOCK_H
+#if !defined(_TRACE_RESCTRL_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_RESCTRL_H

#include <linux/tracepoint.h>

@@ -35,9 +35,9 @@ TRACE_EVENT(pseudo_lock_l3,
TP_printk("hits=%llu miss=%llu",
__entry->l3_hits, __entry->l3_miss));

-#endif /* _TRACE_PSEUDO_LOCK_H */
+#endif /* _TRACE_RESCTRL_H */

#undef TRACE_INCLUDE_PATH
#define TRACE_INCLUDE_PATH .
-#define TRACE_INCLUDE_FILE pseudo_lock_event
+#define TRACE_INCLUDE_FILE trace
#include <trace/define_trace.h>
--
2.25.1


2024-02-21 09:22:00

by Haifeng Xu

[permalink] [raw]
Subject: [PATCH v2 2/2] x86/resctrl: Add tracepoint for llc_occupancy tracking

In our production environment, after removing monitor groups, those unused
RMIDs get stuck in the limbo list forever because their llc_occupancy are
always larger than the threshold. But the unused RMIDs can be successfully
freed by turning up the threshold.

In order to know how much the threshold should be, the following steps can
be taken to acquire the llc_occupancy of RMIDs in each rdt domain:

1) perf probe -a '__rmid_read eventid rmid'
perf probe -a '__rmid_read%return $retval'
2) perf record -e probe:__rmid_read -e probe:__rmid_read__return -aR sleep 10
3) perf script > __rmid_read.txt
4) cat __rmid_read.txt | grep "eventid=0x1 " -A 1 | grep "kworker" > llc_occupnacy.txt

Instead of using perf tool to track llc_occupancy and filter the log manually,
it is more convenient for users to use tracepoint to do this work. So add a new
tracepoint that shows the llc_occupancy of busy RMIDs when scanning the limbo
list.

Signed-off-by: Haifeng Xu <[email protected]>
Suggested-by: Reinette Chatre <[email protected]>
---
arch/x86/kernel/cpu/resctrl/monitor.c | 2 ++
arch/x86/kernel/cpu/resctrl/trace.h | 13 +++++++++++++
2 files changed, 15 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index f136ac046851..1533b1932b49 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -23,6 +23,7 @@
#include <asm/resctrl.h>

#include "internal.h"
+#include "trace.h"

struct rmid_entry {
u32 rmid;
@@ -302,6 +303,7 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
}
}
crmid = nrmid + 1;
+ trace_mon_llc_occupancy_limbo(nrmid, d->id, val);
}
}

diff --git a/arch/x86/kernel/cpu/resctrl/trace.h b/arch/x86/kernel/cpu/resctrl/trace.h
index 495fb90c8572..4bf95b7b4db8 100644
--- a/arch/x86/kernel/cpu/resctrl/trace.h
+++ b/arch/x86/kernel/cpu/resctrl/trace.h
@@ -35,6 +35,19 @@ TRACE_EVENT(pseudo_lock_l3,
TP_printk("hits=%llu miss=%llu",
__entry->l3_hits, __entry->l3_miss));

+TRACE_EVENT(mon_llc_occupancy_limbo,
+ TP_PROTO(u32 mon_hw_id, int id, u64 occupancy),
+ TP_ARGS(mon_hw_id, id, occupancy),
+ TP_STRUCT__entry(__field(u32, mon_hw_id)
+ __field(int, id)
+ __field(u64, occupancy)),
+ TP_fast_assign(__entry->mon_hw_id = mon_hw_id;
+ __entry->id = id;
+ __entry->occupancy = occupancy;),
+ TP_printk("mon_hw_id=%u domain=%d llc_occupancy=%llu",
+ __entry->mon_hw_id, __entry->id, __entry->occupancy)
+ );
+
#endif /* _TRACE_RESCTRL_H */

#undef TRACE_INCLUDE_PATH
--
2.25.1


2024-02-23 19:42:18

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] x86/resctrl: Add tracepoint for llc_occupancy tracking

(+James)

Hi Haifeng and James,

On 2/21/2024 1:21 AM, Haifeng Xu wrote:
> In our production environment, after removing monitor groups, those unused
> RMIDs get stuck in the limbo list forever because their llc_occupancy are
> always larger than the threshold. But the unused RMIDs can be successfully
> freed by turning up the threshold.
>
> In order to know how much the threshold should be, the following steps can
> be taken to acquire the llc_occupancy of RMIDs in each rdt domain:
>
> 1) perf probe -a '__rmid_read eventid rmid'
> perf probe -a '__rmid_read%return $retval'
> 2) perf record -e probe:__rmid_read -e probe:__rmid_read__return -aR sleep 10
> 3) perf script > __rmid_read.txt
> 4) cat __rmid_read.txt | grep "eventid=0x1 " -A 1 | grep "kworker" > llc_occupnacy.txt
>

The details on how perf can be used was useful during the discussion of this
work but can be omitted from this changelog.

> Instead of using perf tool to track llc_occupancy and filter the log manually,
> it is more convenient for users to use tracepoint to do this work. So add a new
> tracepoint that shows the llc_occupancy of busy RMIDs when scanning the limbo
> list.
>
> Signed-off-by: Haifeng Xu <[email protected]>
> Suggested-by: Reinette Chatre <[email protected]>
> ---
> arch/x86/kernel/cpu/resctrl/monitor.c | 2 ++
> arch/x86/kernel/cpu/resctrl/trace.h | 13 +++++++++++++
> 2 files changed, 15 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index f136ac046851..1533b1932b49 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -23,6 +23,7 @@
> #include <asm/resctrl.h>
>
> #include "internal.h"
> +#include "trace.h"
>
> struct rmid_entry {
> u32 rmid;
> @@ -302,6 +303,7 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
> }
> }
> crmid = nrmid + 1;
> + trace_mon_llc_occupancy_limbo(nrmid, d->id, val);

This area recently received some changes (you can find the latest on the
x86/cache branch of the tip repo). Please see [1] for a good
description of the new "index". For this tracing to be useful to MPAM
I thus expect that the tracepoint will need to print the MPAM equivalent
to CLOSID, the PARTID. We can refer to this CLOSID/PARTID value as
"ctrl_hw_id".

This snippet can then change to use the new resctrl_arch_rmid_idx_decode()
to learn the "ctrl_hw_id" and "mon_hw_id" and print it as part of
tracepoint:
"ctrl_hw_id=%u mon_hw_id=%u domain=%d llc_occupancy=%llu"

This will be filesystem code so it cannot know how an architecture
treats these numbers. Consequently, this may look strange to x86 users
when ctrl_hw_id will always be X86_RESCTRL_EMPTY_CLOSID ... but it should
be clear that it is invalid?

James, what do you think? Any thoughts on how MPAM will use the limbo handler
to understand what information can be useful to the user here?

Reinette

[1] https://lore.kernel.org/lkml/[email protected]/

2024-02-23 20:01:10

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86/resctrl: Rename pseudo_lock_event.h to trace.h

Hi Haifeng,

On 2/21/2024 1:21 AM, Haifeng Xu wrote:
> Now only pseudo-lock part uses tracepoints to do event tracking, but
> other parts of resctrl may need new tracepoints. It is unnecessary to
> create separate header files and define CREATE_TRACE_POINTS in different
> c files which fragments the resctrl tracing.
>
> Therefore, the new tracepoints should be placed in the same header file,
> and the header file needs a more generic name.

Please do stick with imperative mood [1]. For example, something like:
"Give the resctrl tracepoint header file a generic name to support
its use for tracepoints that are not specific to pseudo-locking."

(Please feel free to improve.)

Reinette

[1] https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog


2024-02-23 23:24:22

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Track llc_occpuancy of RMIDs in limbo list


Hi Haifeng,

Typo in subject: llc_occpuancy -> llc_occupancy
Could you also please use the x86/resctrl subject prefix in the cover letter?

Thank you

Reinette

2024-02-29 03:13:02

by Haifeng Xu

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86/resctrl: Rename pseudo_lock_event.h to trace.h



On 2024/2/24 04:00, Reinette Chatre wrote:
> Hi Haifeng,
>
> On 2/21/2024 1:21 AM, Haifeng Xu wrote:
>> Now only pseudo-lock part uses tracepoints to do event tracking, but
>> other parts of resctrl may need new tracepoints. It is unnecessary to
>> create separate header files and define CREATE_TRACE_POINTS in different
>> c files which fragments the resctrl tracing.
>>
>> Therefore, the new tracepoints should be placed in the same header file,
>> and the header file needs a more generic name.
>
> Please do stick with imperative mood [1]. For example, something like:
> "Give the resctrl tracepoint header file a generic name to support
> its use for tracepoints that are not specific to pseudo-locking."
>
> (Please feel free to improve.)


Thanks for you suggestion.

>
> Reinette
>
> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.kernel.org_doc_html_latest_process_maintainer-2Dtip.html-23changelog&d=DwICaQ&c=R1GFtfTqKXCFH-lgEPXWwic6stQkW4U7uVq33mt-crw&r=3uoFsejk1jN2oga47MZfph01lLGODc93n4Zqe7b0NRk&m=JoQ5pMB6FFBeGHFDWQYyFgKF2Y5VYhBeykX4853MHrTi-O0Jk3H_K9bh3NaxwLRx&s=482No-jEnFTObHttNwp2LTS-Dc3cP5jQOhL2cEj77MM&e=
>

2024-02-29 03:13:42

by Haifeng Xu

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Track llc_occpuancy of RMIDs in limbo list



On 2024/2/24 07:24, Reinette Chatre wrote:
>
> Hi Haifeng,
>
> Typo in subject: llc_occpuancy -> llc_occupancy
> Could you also please use the x86/resctrl subject prefix in the cover letter?

OK, thanks.

>
> Thank you
>
> Reinette

2024-02-29 03:17:49

by Haifeng Xu

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] x86/resctrl: Add tracepoint for llc_occupancy tracking



On 2024/2/24 03:41, Reinette Chatre wrote:
> (+James)
>
> Hi Haifeng and James,
>
> On 2/21/2024 1:21 AM, Haifeng Xu wrote:
>> In our production environment, after removing monitor groups, those unused
>> RMIDs get stuck in the limbo list forever because their llc_occupancy are
>> always larger than the threshold. But the unused RMIDs can be successfully
>> freed by turning up the threshold.
>>
>> In order to know how much the threshold should be, the following steps can
>> be taken to acquire the llc_occupancy of RMIDs in each rdt domain:
>>
>> 1) perf probe -a '__rmid_read eventid rmid'
>> perf probe -a '__rmid_read%return $retval'
>> 2) perf record -e probe:__rmid_read -e probe:__rmid_read__return -aR sleep 10
>> 3) perf script > __rmid_read.txt
>> 4) cat __rmid_read.txt | grep "eventid=0x1 " -A 1 | grep "kworker" > llc_occupnacy.txt
>>
>
> The details on how perf can be used was useful during the discussion of this
> work but can be omitted from this changelog.

Got it.

>
>> Instead of using perf tool to track llc_occupancy and filter the log manually,
>> it is more convenient for users to use tracepoint to do this work. So add a new
>> tracepoint that shows the llc_occupancy of busy RMIDs when scanning the limbo
>> list.
>>
>> Signed-off-by: Haifeng Xu <[email protected]>
>> Suggested-by: Reinette Chatre <[email protected]>
>> ---
>> arch/x86/kernel/cpu/resctrl/monitor.c | 2 ++
>> arch/x86/kernel/cpu/resctrl/trace.h | 13 +++++++++++++
>> 2 files changed, 15 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index f136ac046851..1533b1932b49 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -23,6 +23,7 @@
>> #include <asm/resctrl.h>
>>
>> #include "internal.h"
>> +#include "trace.h"
>>
>> struct rmid_entry {
>> u32 rmid;
>> @@ -302,6 +303,7 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
>> }
>> }
>> crmid = nrmid + 1;
>> + trace_mon_llc_occupancy_limbo(nrmid, d->id, val);
>
> This area recently received some changes (you can find the latest on the
> x86/cache branch of the tip repo). Please see [1] for a good
> description of the new "index". For this tracing to be useful to MPAM
> I thus expect that the tracepoint will need to print the MPAM equivalent
> to CLOSID, the PARTID. We can refer to this CLOSID/PARTID value as
> "ctrl_hw_id".
>
> This snippet can then change to use the new resctrl_arch_rmid_idx_decode()
> to learn the "ctrl_hw_id" and "mon_hw_id" and print it as part of
> tracepoint:
> "ctrl_hw_id=%u mon_hw_id=%u domain=%d llc_occupancy=%llu"

OK, I'll post a new patch based on tip repo.

>
> This will be filesystem code so it cannot know how an architecture
> treats these numbers. Consequently, this may look strange to x86 users
> when ctrl_hw_id will always be X86_RESCTRL_EMPTY_CLOSID ... but it should
> be clear that it is invalid?
>
> James, what do you think? Any thoughts on how MPAM will use the limbo handler
> to understand what information can be useful to the user here?
>
> Reinette
>
> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_20240213184438.16675-2D7-2Djames.morse-40arm.com_&d=DwICaQ&c=R1GFtfTqKXCFH-lgEPXWwic6stQkW4U7uVq33mt-crw&r=3uoFsejk1jN2oga47MZfph01lLGODc93n4Zqe7b0NRk&m=Grl-QGKKyzz601g4WQFhPFVML6pju3g8CUGyD2VF8r8BUlO_caHlZMafoTxW9iYc&s=ToJ7E8_Afpnn5zh-c-CVReg4WqM-T0pEgB9hN6ntj1A&e=

2024-03-01 17:48:32

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] x86/resctrl: Add tracepoint for llc_occupancy tracking

Hi Reinette,

On 23/02/2024 19:41, Reinette Chatre wrote:
> On 2/21/2024 1:21 AM, Haifeng Xu wrote:
>> In our production environment, after removing monitor groups, those unused
>> RMIDs get stuck in the limbo list forever because their llc_occupancy are
>> always larger than the threshold. But the unused RMIDs can be successfully
>> freed by turning up the threshold.
>>
>> In order to know how much the threshold should be, the following steps can
>> be taken to acquire the llc_occupancy of RMIDs in each rdt domain:
>>
>> 1) perf probe -a '__rmid_read eventid rmid'
>> perf probe -a '__rmid_read%return $retval'
>> 2) perf record -e probe:__rmid_read -e probe:__rmid_read__return -aR sleep 10
>> 3) perf script > __rmid_read.txt
>> 4) cat __rmid_read.txt | grep "eventid=0x1 " -A 1 | grep "kworker" > llc_occupnacy.txt

Ah, this ftrace trickery. It wouldn't be portable - I agree a tracepoint is much better!


> The details on how perf can be used was useful during the discussion of this
> work but can be omitted from this changelog.
>
>> Instead of using perf tool to track llc_occupancy and filter the log manually,
>> it is more convenient for users to use tracepoint to do this work. So add a new
>> tracepoint that shows the llc_occupancy of busy RMIDs when scanning the limbo
>> list.

>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index f136ac046851..1533b1932b49 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -23,6 +23,7 @@
>> #include <asm/resctrl.h>
>>
>> #include "internal.h"
>> +#include "trace.h"
>>
>> struct rmid_entry {
>> u32 rmid;
>> @@ -302,6 +303,7 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
>> }
>> }
>> crmid = nrmid + 1;
>> + trace_mon_llc_occupancy_limbo(nrmid, d->id, val);

> This area recently received some changes (you can find the latest on the
> x86/cache branch of the tip repo). Please see [1] for a good
> description of the new "index". For this tracing to be useful to MPAM
> I thus expect that the tracepoint will need to print the MPAM equivalent
> to CLOSID, the PARTID. We can refer to this CLOSID/PARTID value as
> "ctrl_hw_id".
>
> This snippet can then change to use the new resctrl_arch_rmid_idx_decode()
> to learn the "ctrl_hw_id" and "mon_hw_id" and print it as part of
> tracepoint:
> "ctrl_hw_id=%u mon_hw_id=%u domain=%d llc_occupancy=%llu"
>
> This will be filesystem code so it cannot know how an architecture
> treats these numbers. Consequently, this may look strange to x86 users
> when ctrl_hw_id will always be X86_RESCTRL_EMPTY_CLOSID ... but it should
> be clear that it is invalid?


> James, what do you think? Any thoughts on how MPAM will use the limbo handler
> to understand what information can be useful to the user here?

Initially it will be exactly the same, and this certainly works. I agree outputting both
the CLOSID and RMID (with more portable names) is the right thing to do.

I'll reply in more detail on what appears to be v3.
lore.kernel.org/r/[email protected]


Thanks,

James