2018-12-18 02:31:37

by Aubrey Li

[permalink] [raw]
Subject: [RESEND PATCH v5 1/3] x86/fpu: track AVX-512 usage of tasks

User space tools which do automated task placement need information
about AVX-512 usage of tasks, because AVX-512 usage could cause core
turbo frequency drop and impact the running task on the sibling CPU.

The XSAVE hardware structure has bits that indicate when valid state
is present in registers unique to AVX-512 use. Use these bits to
indicate when AVX-512 has been in use and add per-task AVX-512 state
timestamp tracking to context switch.

Well-written AVX-512 applications are expected to clear the AVX-512
state when not actively using AVX-512 registers, so the tracking
mechanism is imprecise and can theoretically miss AVX-512 usage during
context switch. But it has been measured to be precise enough to be
useful under real-world workloads like tensorflow and linpack.

If higher precision is required, suggest user space tools to use the
PMU-based mechanisms in combination.

Signed-off-by: Aubrey Li <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Arjan van de Ven <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 7 +++++++
arch/x86/include/asm/fpu/types.h | 7 +++++++
2 files changed, 14 insertions(+)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index a38bf5a1e37a..8778ac172255 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -411,6 +411,13 @@ static inline int copy_fpregs_to_fpstate(struct fpu *fpu)
{
if (likely(use_xsave())) {
copy_xregs_to_kernel(&fpu->state.xsave);
+
+ /*
+ * AVX512 state is tracked here because its use is
+ * known to slow the max clock speed of the core.
+ */
+ if (fpu->state.xsave.header.xfeatures & XFEATURE_MASK_AVX512)
+ fpu->avx512_timestamp = jiffies_64;
return 1;
}

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 202c53918ecf..81393dabdb46 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -302,6 +302,13 @@ struct fpu {
*/
unsigned char initialized;

+ /*
+ * @avx512_timestamp:
+ *
+ * Records the timestamp of AVX512 use during last context switch.
+ */
+ u64 avx512_timestamp;
+
/*
* @state:
*
--
2.17.1



2018-12-18 02:31:39

by Aubrey Li

[permalink] [raw]
Subject: [RESEND PATCH v5 2/3] proc: add AVX-512 usage elapsed time to /proc/pid/status

AVX-512 components use could cause core turbo frequency drop. So
it's useful to expose AVX-512 usage elapsed time as a heuristic hint
for the user space job scheduler to cluster the AVX-512 using tasks
together.

Example:
$ cat /proc/pid/status | grep AVX512_elapsed_ms
AVX512_elapsed_ms: 1020

The number '1020' denotes 1020 millisecond elapsed since last time
context switch the off-CPU task using AVX-512 components, thus the
task could cause core frequency drop.

Or:
$ cat /proc/pid/status | grep AVX512_elapsed_ms
AVX512_elapsed_ms: -1

The number '-1' indicates the task didn't use AVX-512 components
before thus unlikely has frequency drop issue.

User space tools may want to further check by:

$ perf stat --pid <pid> -e core_power.lvl2_turbo_license -- sleep 1

Performance counter stats for process id '3558':

3,251,565,961 core_power.lvl2_turbo_license

1.004031387 seconds time elapsed

Non-zero counter value confirms that the task causes frequency drop.

Signed-off-by: Aubrey Li <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Arjan van de Ven <[email protected]>
---
arch/x86/kernel/fpu/xstate.c | 34 ++++++++++++++++++++++++++++++++++
fs/proc/array.c | 5 +++++
2 files changed, 39 insertions(+)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 87a57b7642d3..d084b1dc80a6 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -7,6 +7,7 @@
#include <linux/cpu.h>
#include <linux/mman.h>
#include <linux/pkeys.h>
+#include <linux/seq_file.h>

#include <asm/fpu/api.h>
#include <asm/fpu/internal.h>
@@ -1245,3 +1246,36 @@ int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf)

return 0;
}
+
+/*
+ * Report the amount of time elapsed in millisecond since last AVX512
+ * use in the task.
+ */
+void avx512_state(struct seq_file *m, struct task_struct *task)
+{
+ u64 timestamp = task->thread.fpu.avx512_timestamp;
+ s64 delta;
+
+ if (!timestamp)
+ delta = -1;
+ else {
+ WARN_ON_ONCE(jiffies_64 < timestamp);
+ delta = div_u64(jiffies64_to_nsecs(jiffies_64 - timestamp),
+ NSEC_PER_MSEC);
+ }
+
+ seq_put_decimal_ll(m, "AVX512_elapsed_ms:\t", delta);
+ seq_putc(m, '\n');
+}
+
+/*
+ * Report CPU specific thread state
+ */
+void arch_task_state(struct seq_file *m, struct task_struct *task)
+{
+ /*
+ * Report AVX512 state if the processor and build option supported.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_AVX512F))
+ avx512_state(m, task);
+}
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 0ceb3b6b37e7..dd88c2219f08 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -392,6 +392,10 @@ static inline void task_core_dumping(struct seq_file *m, struct mm_struct *mm)
seq_putc(m, '\n');
}

+void __weak arch_task_state(struct seq_file *m, struct task_struct *task)
+{
+}
+
int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task)
{
@@ -414,6 +418,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
task_cpus_allowed(m, task);
cpuset_task_status_allowed(m, task);
task_context_switch_counts(m, task);
+ arch_task_state(m, task);
return 0;
}

--
2.17.1


2018-12-18 02:33:05

by Aubrey Li

[permalink] [raw]
Subject: [RESEND PATCH v5 3/3] Documentation/filesystems/proc.txt: add AVX512_elapsed_ms

Added AVX512_elapsed_ms in /proc/<pid>/status. Report it
in Documentation/filesystems/proc.txt

Signed-off-by: Aubrey Li <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Arjan van de Ven <[email protected]>
---
Documentation/filesystems/proc.txt | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 520f6a84cf50..c4be304bce69 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -197,6 +197,7 @@ read the file /proc/PID/status:
Seccomp: 0
voluntary_ctxt_switches: 0
nonvoluntary_ctxt_switches: 1
+ AVX512_elapsed_ms: 1020

This shows you nearly the same information you would get if you viewed it with
the ps command. In fact, ps uses the proc file system to obtain its
@@ -214,7 +215,7 @@ asynchronous manner and the value may not be very precise. To see a precise
snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.
It's slow but very precise.

-Table 1-2: Contents of the status files (as of 4.8)
+Table 1-2: Contents of the status files (as of 4.21)
..............................................................................
Field Content
Name filename of the executable
@@ -275,6 +276,7 @@ Table 1-2: Contents of the status files (as of 4.8)
Mems_allowed_list Same as previous, but in "list format"
voluntary_ctxt_switches number of voluntary context switches
nonvoluntary_ctxt_switches number of non voluntary context switches
+ AVX512_elapsed_ms time elapsed since last AVX512 use in millisecond
..............................................................................

Table 1-3: Contents of the statm files (as of 2.6.8-rc3)
--
2.17.1


2018-12-18 08:35:06

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 1/3] x86/fpu: track AVX-512 usage of tasks

Aubrey,

On Tue, 18 Dec 2018, Aubrey Li wrote:

RESEND....

Please don't do that. This is not a resend because you changed something,
so it's new version. Usually I ignore resends when I have the original
submission already lined up for review.

Thanks,

tglx





2018-12-18 08:38:08

by Li, Aubrey

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 1/3] x86/fpu: track AVX-512 usage of tasks

On 2018/12/18 16:33, Thomas Gleixner wrote:
> Aubrey,
>
> On Tue, 18 Dec 2018, Aubrey Li wrote:
>
> RESEND....
>
> Please don't do that. This is not a resend because you changed something,
> so it's new version. Usually I ignore resends when I have the original
> submission already lined up for review.

oh, okay, I'll send another version, sorry for the trouble...

Thanks,
-Aubrey