2024-04-08 09:24:09

by Haifeng Xu

[permalink] [raw]
Subject: [PATCH v7 0/2] x86/resctrl: Track llc_occupancy of RMIDs in limbo list

After removing a monitor group, its RMID may not be freed immediately
unless its llc_occupancy is less than the re-allocation threshold. If
turning up the threshold, the RMID can be reused. In order to know how
much the threshold should be, it's necessary to acquire the llc_occupancy.

The patch series provides a new tracepoint to track the llc_occupancy.

Changes since v1:
- Rename pseudo_lock_event.h instead of creating a new header file.
- Modify names used in the tracepoint.
- Update changelog.

Changes since v2:
- Fix typo and use the x86/resctrl subject prefix in the cover letter.
- Track both CLOSID and RMID in the tracepoint.
- Remove the details on how perf can be used in patch2's changelog.

Changes since v3:
- Put the tracepoint in the 'else' section of the if/else after
resctrl_arch_rmid_read().
- Modify names used in the tracepoint.
- Document the properties of the tracepoint.

Changes since v4:
- Add Reviewed-by tags.
- Include more maintainers in the submission.

Changes since v5:
- Update the documentation and comments of the tracepoint.
- Code cleanup.

Changes since v6:
- Add Reviewed-by tags.

Haifeng Xu (2):
x86/resctrl: Rename pseudo_lock_event.h to trace.h
x86/resctrl: Add tracepoint for llc_occupancy tracking

Documentation/arch/x86/resctrl.rst | 6 +++++
arch/x86/kernel/cpu/resctrl/monitor.c | 11 +++++++++
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
.../resctrl/{pseudo_lock_event.h => trace.h} | 24 +++++++++++++++----
4 files changed, 38 insertions(+), 5 deletions(-)
rename arch/x86/kernel/cpu/resctrl/{pseudo_lock_event.h => trace.h} (56%)

--
2.25.1



2024-04-08 09:24:22

by Haifeng Xu

[permalink] [raw]
Subject: [PATCH v7 1/2] x86/resctrl: Rename pseudo_lock_event.h to trace.h

Now only pseudo-locking part uses tracepoints to do event tracking, but
other parts of resctrl may need new tracepoints. It is unnecessary to
create separate header files and define CREATE_TRACE_POINTS in different
c files which fragments the resctrl tracing.

Therefore, give the resctrl tracepoint header file a generic name to
support its use for tracepoints that are not specific to pseudo-locking.

No functional change.

Signed-off-by: Haifeng Xu <[email protected]>
Suggested-by: Reinette Chatre <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
---
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
.../kernel/cpu/resctrl/{pseudo_lock_event.h => trace.h} | 8 ++++----
2 files changed, 5 insertions(+), 5 deletions(-)
rename arch/x86/kernel/cpu/resctrl/{pseudo_lock_event.h => trace.h} (86%)

diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 884b88e25141..492c8e28c4ce 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -31,7 +31,7 @@
#include "internal.h"

#define CREATE_TRACE_POINTS
-#include "pseudo_lock_event.h"
+#include "trace.h"

/*
* The bits needed to disable hardware prefetching varies based on the
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h b/arch/x86/kernel/cpu/resctrl/trace.h
similarity index 86%
rename from arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h
rename to arch/x86/kernel/cpu/resctrl/trace.h
index 428ebbd4270b..495fb90c8572 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h
+++ b/arch/x86/kernel/cpu/resctrl/trace.h
@@ -2,8 +2,8 @@
#undef TRACE_SYSTEM
#define TRACE_SYSTEM resctrl

-#if !defined(_TRACE_PSEUDO_LOCK_H) || defined(TRACE_HEADER_MULTI_READ)
-#define _TRACE_PSEUDO_LOCK_H
+#if !defined(_TRACE_RESCTRL_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_RESCTRL_H

#include <linux/tracepoint.h>

@@ -35,9 +35,9 @@ TRACE_EVENT(pseudo_lock_l3,
TP_printk("hits=%llu miss=%llu",
__entry->l3_hits, __entry->l3_miss));

-#endif /* _TRACE_PSEUDO_LOCK_H */
+#endif /* _TRACE_RESCTRL_H */

#undef TRACE_INCLUDE_PATH
#define TRACE_INCLUDE_PATH .
-#define TRACE_INCLUDE_FILE pseudo_lock_event
+#define TRACE_INCLUDE_FILE trace
#include <trace/define_trace.h>
--
2.25.1


2024-04-08 09:24:35

by Haifeng Xu

[permalink] [raw]
Subject: [PATCH v7 2/2] x86/resctrl: Add tracepoint for llc_occupancy tracking

In our production environment, after removing monitor groups, those unused
RMIDs get stuck in the limbo list forever because their llc_occupancy are
always larger than the threshold. But the unused RMIDs can be successfully
freed by turning up the threshold.

In order to know how much the threshold should be, perf can be used to
acquire the llc_occupancy of RMIDs in each rdt domain.

Instead of using perf tool to track llc_occupancy and filter the log
manually, it is more convenient for users to use tracepoint to do this
work. So add a new tracepoint that shows the llc_occupancy of busy RMIDs
when scanning the limbo list.

Signed-off-by: Haifeng Xu <[email protected]>
Suggested-by: Reinette Chatre <[email protected]>
Suggested-by: James Morse <[email protected]>
Reviewed-by: James Morse <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
---
Documentation/arch/x86/resctrl.rst | 6 ++++++
arch/x86/kernel/cpu/resctrl/monitor.c | 11 +++++++++++
arch/x86/kernel/cpu/resctrl/trace.h | 16 ++++++++++++++++
3 files changed, 33 insertions(+)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index a6279df64a9d..bcdbd23cd8a7 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -446,6 +446,12 @@ during mkdir.
max_threshold_occupancy is a user configurable value to determine the
occupancy at which an RMID can be freed.

+The mon_llc_occupancy_limbo tracepoint gives the precise occupancy in bytes
+for a subset of RMID that are not immediately available for allocation.
+This can't be relied on to produce output every second, it may be necessary
+to attempt to create an empty monitor group to force an update. Output may
+only be produced if creation of a control or monitor group fails.
+
Schemata files - general concepts
---------------------------------
Each line in the file describes one resource. The line starts with
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index c34a35ec0f03..2345e6836593 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -24,6 +24,7 @@
#include <asm/resctrl.h>

#include "internal.h"
+#include "trace.h"

/**
* struct rmid_entry - dirty tracking for all RMID.
@@ -354,6 +355,16 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
rmid_dirty = true;
} else {
rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
+
+ /*
+ * x86's CLOSID and RMID are independent numbers, so the entry's
+ * CLOSID is an empty CLOSID (X86_RESCTRL_EMPTY_CLOSID). On Arm the
+ * RMID (PMG) extends the CLOSID (PARTID) space with bits that aren't
+ * used to select the configuration. It is thus necessary to track both
+ * CLOSID and RMID because there may be dependencies between them
+ * on some architectures.
+ */
+ trace_mon_llc_occupancy_limbo(entry->closid, entry->rmid, d->id, val);
}

if (force_free || !rmid_dirty) {
diff --git a/arch/x86/kernel/cpu/resctrl/trace.h b/arch/x86/kernel/cpu/resctrl/trace.h
index 495fb90c8572..2a506316b303 100644
--- a/arch/x86/kernel/cpu/resctrl/trace.h
+++ b/arch/x86/kernel/cpu/resctrl/trace.h
@@ -35,6 +35,22 @@ TRACE_EVENT(pseudo_lock_l3,
TP_printk("hits=%llu miss=%llu",
__entry->l3_hits, __entry->l3_miss));

+TRACE_EVENT(mon_llc_occupancy_limbo,
+ TP_PROTO(u32 ctrl_hw_id, u32 mon_hw_id, int domain_id, u64 llc_occupancy_bytes),
+ TP_ARGS(ctrl_hw_id, mon_hw_id, domain_id, llc_occupancy_bytes),
+ TP_STRUCT__entry(__field(u32, ctrl_hw_id)
+ __field(u32, mon_hw_id)
+ __field(int, domain_id)
+ __field(u64, llc_occupancy_bytes)),
+ TP_fast_assign(__entry->ctrl_hw_id = ctrl_hw_id;
+ __entry->mon_hw_id = mon_hw_id;
+ __entry->domain_id = domain_id;
+ __entry->llc_occupancy_bytes = llc_occupancy_bytes;),
+ TP_printk("ctrl_hw_id=%u mon_hw_id=%u domain_id=%d llc_occupancy_bytes=%llu",
+ __entry->ctrl_hw_id, __entry->mon_hw_id, __entry->domain_id,
+ __entry->llc_occupancy_bytes)
+ );
+
#endif /* _TRACE_RESCTRL_H */

#undef TRACE_INCLUDE_PATH
--
2.25.1


Subject: [tip: x86/cache] x86/resctrl: Add tracepoint for llc_occupancy tracking

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 931be446c6cbc15691dd499957e961f4e1d56afb
Gitweb: https://git.kernel.org/tip/931be446c6cbc15691dd499957e961f4e1d56afb
Author: Haifeng Xu <[email protected]>
AuthorDate: Mon, 08 Apr 2024 17:23:03 +08:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Wed, 24 Apr 2024 14:24:48 +02:00

x86/resctrl: Add tracepoint for llc_occupancy tracking

In our production environment, after removing monitor groups, those
unused RMIDs get stuck in the limbo list forever because their
llc_occupancy is always larger than the threshold. But the unused RMIDs
can be successfully freed by turning up the threshold.

In order to know how much the threshold should be, perf can be used to
acquire the llc_occupancy of RMIDs in each rdt domain.

Instead of using perf tool to track llc_occupancy and filter the log
manually, it is more convenient for users to use tracepoint to do this
work. So add a new tracepoint that shows the llc_occupancy of busy RMIDs
when scanning the limbo list.

Suggested-by: Reinette Chatre <[email protected]>
Suggested-by: James Morse <[email protected]>
Signed-off-by: Haifeng Xu <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: James Morse <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
Documentation/arch/x86/resctrl.rst | 6 ++++++
arch/x86/kernel/cpu/resctrl/monitor.c | 11 +++++++++++
arch/x86/kernel/cpu/resctrl/trace.h | 16 ++++++++++++++++
3 files changed, 33 insertions(+)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 6c24558..627e238 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -446,6 +446,12 @@ during mkdir.
max_threshold_occupancy is a user configurable value to determine the
occupancy at which an RMID can be freed.

+The mon_llc_occupancy_limbo tracepoint gives the precise occupancy in bytes
+for a subset of RMID that are not immediately available for allocation.
+This can't be relied on to produce output every second, it may be necessary
+to attempt to create an empty monitor group to force an update. Output may
+only be produced if creation of a control or monitor group fails.
+
Schemata files - general concepts
---------------------------------
Each line in the file describes one resource. The line starts with
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index c34a35e..2345e68 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -24,6 +24,7 @@
#include <asm/resctrl.h>

#include "internal.h"
+#include "trace.h"

/**
* struct rmid_entry - dirty tracking for all RMID.
@@ -354,6 +355,16 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
rmid_dirty = true;
} else {
rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
+
+ /*
+ * x86's CLOSID and RMID are independent numbers, so the entry's
+ * CLOSID is an empty CLOSID (X86_RESCTRL_EMPTY_CLOSID). On Arm the
+ * RMID (PMG) extends the CLOSID (PARTID) space with bits that aren't
+ * used to select the configuration. It is thus necessary to track both
+ * CLOSID and RMID because there may be dependencies between them
+ * on some architectures.
+ */
+ trace_mon_llc_occupancy_limbo(entry->closid, entry->rmid, d->id, val);
}

if (force_free || !rmid_dirty) {
diff --git a/arch/x86/kernel/cpu/resctrl/trace.h b/arch/x86/kernel/cpu/resctrl/trace.h
index 495fb90..2a50631 100644
--- a/arch/x86/kernel/cpu/resctrl/trace.h
+++ b/arch/x86/kernel/cpu/resctrl/trace.h
@@ -35,6 +35,22 @@ TRACE_EVENT(pseudo_lock_l3,
TP_printk("hits=%llu miss=%llu",
__entry->l3_hits, __entry->l3_miss));

+TRACE_EVENT(mon_llc_occupancy_limbo,
+ TP_PROTO(u32 ctrl_hw_id, u32 mon_hw_id, int domain_id, u64 llc_occupancy_bytes),
+ TP_ARGS(ctrl_hw_id, mon_hw_id, domain_id, llc_occupancy_bytes),
+ TP_STRUCT__entry(__field(u32, ctrl_hw_id)
+ __field(u32, mon_hw_id)
+ __field(int, domain_id)
+ __field(u64, llc_occupancy_bytes)),
+ TP_fast_assign(__entry->ctrl_hw_id = ctrl_hw_id;
+ __entry->mon_hw_id = mon_hw_id;
+ __entry->domain_id = domain_id;
+ __entry->llc_occupancy_bytes = llc_occupancy_bytes;),
+ TP_printk("ctrl_hw_id=%u mon_hw_id=%u domain_id=%d llc_occupancy_bytes=%llu",
+ __entry->ctrl_hw_id, __entry->mon_hw_id, __entry->domain_id,
+ __entry->llc_occupancy_bytes)
+ );
+
#endif /* _TRACE_RESCTRL_H */

#undef TRACE_INCLUDE_PATH

Subject: [tip: x86/cache] x86/resctrl: Rename pseudo_lock_event.h to trace.h

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 87739229485ac724849178eb6c35e38c6161eb77
Gitweb: https://git.kernel.org/tip/87739229485ac724849178eb6c35e38c6161eb77
Author: Haifeng Xu <[email protected]>
AuthorDate: Mon, 08 Apr 2024 17:23:02 +08:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Wed, 24 Apr 2024 14:21:52 +02:00

x86/resctrl: Rename pseudo_lock_event.h to trace.h

Now only the pseudo-locking part uses tracepoints to do event tracking,
but other parts of resctrl may need new tracepoints. It is unnecessary
to create separate header files and define CREATE_TRACE_POINTS in
different c files which fragments the resctrl tracing.

Therefore, give the resctrl tracepoint header file a generic name to
support its use for tracepoints that are not specific to pseudo-locking.

No functional change.

Suggested-by: Reinette Chatre <[email protected]>
Signed-off-by: Haifeng Xu <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h | 43 +----------------
arch/x86/kernel/cpu/resctrl/trace.h | 43 ++++++++++++++++-
3 files changed, 44 insertions(+), 44 deletions(-)
delete mode 100644 arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h
create mode 100644 arch/x86/kernel/cpu/resctrl/trace.h

diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 884b88e..492c8e2 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -31,7 +31,7 @@
#include "internal.h"

#define CREATE_TRACE_POINTS
-#include "pseudo_lock_event.h"
+#include "trace.h"

/*
* The bits needed to disable hardware prefetching varies based on the
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h b/arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h
deleted file mode 100644
index 428ebbd..0000000
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h
+++ /dev/null
@@ -1,43 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#undef TRACE_SYSTEM
-#define TRACE_SYSTEM resctrl
-
-#if !defined(_TRACE_PSEUDO_LOCK_H) || defined(TRACE_HEADER_MULTI_READ)
-#define _TRACE_PSEUDO_LOCK_H
-
-#include <linux/tracepoint.h>
-
-TRACE_EVENT(pseudo_lock_mem_latency,
- TP_PROTO(u32 latency),
- TP_ARGS(latency),
- TP_STRUCT__entry(__field(u32, latency)),
- TP_fast_assign(__entry->latency = latency),
- TP_printk("latency=%u", __entry->latency)
- );
-
-TRACE_EVENT(pseudo_lock_l2,
- TP_PROTO(u64 l2_hits, u64 l2_miss),
- TP_ARGS(l2_hits, l2_miss),
- TP_STRUCT__entry(__field(u64, l2_hits)
- __field(u64, l2_miss)),
- TP_fast_assign(__entry->l2_hits = l2_hits;
- __entry->l2_miss = l2_miss;),
- TP_printk("hits=%llu miss=%llu",
- __entry->l2_hits, __entry->l2_miss));
-
-TRACE_EVENT(pseudo_lock_l3,
- TP_PROTO(u64 l3_hits, u64 l3_miss),
- TP_ARGS(l3_hits, l3_miss),
- TP_STRUCT__entry(__field(u64, l3_hits)
- __field(u64, l3_miss)),
- TP_fast_assign(__entry->l3_hits = l3_hits;
- __entry->l3_miss = l3_miss;),
- TP_printk("hits=%llu miss=%llu",
- __entry->l3_hits, __entry->l3_miss));
-
-#endif /* _TRACE_PSEUDO_LOCK_H */
-
-#undef TRACE_INCLUDE_PATH
-#define TRACE_INCLUDE_PATH .
-#define TRACE_INCLUDE_FILE pseudo_lock_event
-#include <trace/define_trace.h>
diff --git a/arch/x86/kernel/cpu/resctrl/trace.h b/arch/x86/kernel/cpu/resctrl/trace.h
new file mode 100644
index 0000000..495fb90
--- /dev/null
+++ b/arch/x86/kernel/cpu/resctrl/trace.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM resctrl
+
+#if !defined(_TRACE_RESCTRL_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_RESCTRL_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(pseudo_lock_mem_latency,
+ TP_PROTO(u32 latency),
+ TP_ARGS(latency),
+ TP_STRUCT__entry(__field(u32, latency)),
+ TP_fast_assign(__entry->latency = latency),
+ TP_printk("latency=%u", __entry->latency)
+ );
+
+TRACE_EVENT(pseudo_lock_l2,
+ TP_PROTO(u64 l2_hits, u64 l2_miss),
+ TP_ARGS(l2_hits, l2_miss),
+ TP_STRUCT__entry(__field(u64, l2_hits)
+ __field(u64, l2_miss)),
+ TP_fast_assign(__entry->l2_hits = l2_hits;
+ __entry->l2_miss = l2_miss;),
+ TP_printk("hits=%llu miss=%llu",
+ __entry->l2_hits, __entry->l2_miss));
+
+TRACE_EVENT(pseudo_lock_l3,
+ TP_PROTO(u64 l3_hits, u64 l3_miss),
+ TP_ARGS(l3_hits, l3_miss),
+ TP_STRUCT__entry(__field(u64, l3_hits)
+ __field(u64, l3_miss)),
+ TP_fast_assign(__entry->l3_hits = l3_hits;
+ __entry->l3_miss = l3_miss;),
+ TP_printk("hits=%llu miss=%llu",
+ __entry->l3_hits, __entry->l3_miss));
+
+#endif /* _TRACE_RESCTRL_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE trace
+#include <trace/define_trace.h>