AMX state is a large state (at least 8KB or more). Entering CPU idle with
this non-initialized large state may result in shallow states while a
deeper low-power state is available.
We can confirm this behavior is implementation-specific. Section 3.3 in [1]
will be updated to clarify this. Sorry for the delay in this conclusion
and the update here.
Changes from v1 [2]:
* Drop the code to call TILERELEASE from arch_cpu_idle_enter(). Instead,
do it from the intel-idle driver, as the behavior is *not* architectural.
* Simplify the driver code change (Rui).
* Improve the dynamic state flag check in the helper function.
* Drop the new C-state table as it is done by another patchset [3].
* Drop the opcode map changes as tip's x86/misc [4] has merged it already.
With the last two changes, the series depends on these:
* The C-state table update for Sapphire Rapids in linux-pm [3].
* The update of the x86 opcode map in the tip's branch [4].
The series on top of the above is available here:
git://github.com/intel/amx-linux.git tilerelease
Thanks,
Chang
[1]: Intel Architecture Instruction Set Extension Programming Reference
May 2021, https://software.intel.com/content/dam/develop/external/us/en/documents-tps/architecture-instruction-set-extensions-programming-reference.pdf
[2]: https://lore.kernel.org/lkml/[email protected]/
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-next&id=9edf3c0ffef0ec1bed8300315852b5c6a0997130
[4]: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/misc&id=9dd94df75b30eca03ed2151dd5bbc152a6f19abf
Chang S. Bae (2):
x86/fpu: Add a helper to prepare AMX state for low-power CPU idle
intel_idle: Add a new flag to initialize the AMX state
arch/x86/include/asm/fpu/api.h | 2 ++
arch/x86/include/asm/special_insns.h | 9 +++++++++
arch/x86/kernel/fpu/core.c | 15 +++++++++++++++
drivers/idle/intel_idle.c | 18 ++++++++++++++++--
4 files changed, 42 insertions(+), 2 deletions(-)
--
2.17.1
When a CPU enters an idle state, non-initialized states left in large
registers may be the cause of preventing deeper low-power states.
The new helper ensures the AMX state is initialized to make the CPU
ready for low-power states. It will be used by the intel idle driver.
Signed-off-by: Chang S. Bae <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from v1:
* Check the dynamic state flag first, to avoid #UD with XGETBV(1).
Dependency on the opcode map update:
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/misc&id=9dd94df75b30eca03ed2151dd5bbc152a6f19abf
---
arch/x86/include/asm/fpu/api.h | 2 ++
arch/x86/include/asm/special_insns.h | 9 +++++++++
arch/x86/kernel/fpu/core.c | 15 +++++++++++++++
3 files changed, 26 insertions(+)
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index c83b3020350a..df48912fd1c8 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -165,4 +165,6 @@ static inline bool fpstate_is_confidential(struct fpu_guest *gfpu)
struct task_struct;
extern long fpu_xstate_prctl(struct task_struct *tsk, int option, unsigned long arg2);
+extern void fpu_idle_fpregs(void);
+
#endif /* _ASM_X86_FPU_API_H */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 68c257a3de0d..d434fbaeb3ff 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -294,6 +294,15 @@ static inline int enqcmds(void __iomem *dst, const void *src)
return 0;
}
+static inline void tile_release(void)
+{
+ /*
+ * Instruction opcode for TILERELEASE; supported in binutils
+ * version >= 2.36.
+ */
+ asm volatile(".byte 0xc4, 0xe2, 0x78, 0x49, 0xc0");
+}
+
#endif /* __KERNEL__ */
#endif /* _ASM_X86_SPECIAL_INSNS_H */
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 8dea01ffc5c1..572dfa962b7b 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -847,3 +847,18 @@ int fpu__exception_code(struct fpu *fpu, int trap_nr)
*/
return 0;
}
+
+/*
+ * Initialize register state that may prevent from entering low-power idle.
+ * This function will be invoked from the cpuidle driver only when needed.
+ */
+void fpu_idle_fpregs(void)
+{
+ if (!fpu_state_size_dynamic())
+ return;
+
+ if (xfeatures_in_use() & XFEATURE_MASK_XTILE) {
+ tile_release();
+ fpregs_deactivate(¤t->thread.fpu);
+ }
+}
--
2.17.1
The non-initialized AMX state can be the cause of C-state demotion from C6
to C1E. This low-power idle state may improve power savings and thus result
in a higher available turbo frequency budget.
This behavior is implementation-specific. Initialize the state for the C6
entrance of Sapphire Rapids as needed.
Suggested-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Tested-by : Zhang Rui <[email protected]>
Cc: Artem Bityutskiy <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes from v1:
* Simplify the code with a new flag (Rui).
* Rebase on Artem's patches for SPR intel_idle.
* Massage the changelog.
Dependency on the new C-state table for SPR:
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-next&id=9edf3c0ffef0ec1bed8300315852b5c6a0997130
---
drivers/idle/intel_idle.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 1c7c25909e54..6ecbeffdf785 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -54,6 +54,7 @@
#include <asm/intel-family.h>
#include <asm/mwait.h>
#include <asm/msr.h>
+#include <asm/fpu/api.h>
#define INTEL_IDLE_VERSION "0.5.1"
@@ -99,6 +100,11 @@ static unsigned int mwait_substates __initdata;
*/
#define CPUIDLE_FLAG_ALWAYS_ENABLE BIT(15)
+/*
+ * Initialize large xstate for the C6-state entrance.
+ */
+#define CPUIDLE_FLAG_INIT_XSTATE BIT(16)
+
/*
* MWAIT takes an 8-bit "hint" in EAX "suggesting"
* the C-state (top nibble) and sub-state (bottom nibble)
@@ -136,6 +142,9 @@ static __cpuidle int intel_idle(struct cpuidle_device *dev,
if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE)
local_irq_enable();
+ if (state->flags & CPUIDLE_FLAG_INIT_XSTATE)
+ fpu_idle_fpregs();
+
mwait_idle_with_hints(eax, ecx);
return index;
@@ -156,8 +165,12 @@ static __cpuidle int intel_idle(struct cpuidle_device *dev,
static __cpuidle int intel_idle_s2idle(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int index)
{
- unsigned long eax = flg2MWAIT(drv->states[index].flags);
unsigned long ecx = 1; /* break on interrupt flag */
+ struct cpuidle_state *state = &drv->states[index];
+ unsigned long eax = flg2MWAIT(state->flags);
+
+ if (state->flags & CPUIDLE_FLAG_INIT_XSTATE)
+ fpu_idle_fpregs();
mwait_idle_with_hints(eax, ecx);
@@ -792,7 +805,8 @@ static struct cpuidle_state spr_cstates[] __initdata = {
{
.name = "C6",
.desc = "MWAIT 0x20",
- .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED,
+ .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED | \
+ CPUIDLE_FLAG_INIT_XSTATE,
.exit_latency = 290,
.target_residency = 800,
.enter = &intel_idle,
--
2.17.1
On 3/9/22 14:34, Chang S. Bae wrote:
> +/*
> + * Initialize register state that may prevent from entering low-power idle.
> + * This function will be invoked from the cpuidle driver only when needed.
> + */
> +void fpu_idle_fpregs(void)
> +{
> + if (!fpu_state_size_dynamic())
> + return;
Is this check just an optimization? I'm having trouble imagining a
situation where we would have:
(xfeatures_in_use() & XFEATURE_MASK_XTILE) == true
but
fpu_state_size_dynamic() == false
> + if (xfeatures_in_use() & XFEATURE_MASK_XTILE) {
> + tile_release();
> + fpregs_deactivate(¤t->thread.fpu);
> + }
> +}
xfeatures_in_use() isn't exactly expensive either.
On Wed, Mar 9, 2022 at 11:42 PM Chang S. Bae <[email protected]> wrote:
>
> The non-initialized AMX state can be the cause of C-state demotion from C6
> to C1E. This low-power idle state may improve power savings and thus result
> in a higher available turbo frequency budget.
>
> This behavior is implementation-specific. Initialize the state for the C6
> entrance of Sapphire Rapids as needed.
>
> Suggested-by: Peter Zijlstra (Intel) <[email protected]>
> Signed-off-by: Chang S. Bae <[email protected]>
> Tested-by : Zhang Rui <[email protected]>
> Cc: Artem Bityutskiy <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---
> Changes from v1:
> * Simplify the code with a new flag (Rui).
> * Rebase on Artem's patches for SPR intel_idle.
> * Massage the changelog.
>
> Dependency on the new C-state table for SPR:
> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-next&id=9edf3c0ffef0ec1bed8300315852b5c6a0997130
> ---
> drivers/idle/intel_idle.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 1c7c25909e54..6ecbeffdf785 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -54,6 +54,7 @@
> #include <asm/intel-family.h>
> #include <asm/mwait.h>
> #include <asm/msr.h>
> +#include <asm/fpu/api.h>
>
> #define INTEL_IDLE_VERSION "0.5.1"
>
> @@ -99,6 +100,11 @@ static unsigned int mwait_substates __initdata;
> */
> #define CPUIDLE_FLAG_ALWAYS_ENABLE BIT(15)
>
> +/*
> + * Initialize large xstate for the C6-state entrance.
> + */
> +#define CPUIDLE_FLAG_INIT_XSTATE BIT(16)
> +
> /*
> * MWAIT takes an 8-bit "hint" in EAX "suggesting"
> * the C-state (top nibble) and sub-state (bottom nibble)
> @@ -136,6 +142,9 @@ static __cpuidle int intel_idle(struct cpuidle_device *dev,
> if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE)
> local_irq_enable();
>
> + if (state->flags & CPUIDLE_FLAG_INIT_XSTATE)
> + fpu_idle_fpregs();
> +
> mwait_idle_with_hints(eax, ecx);
>
> return index;
> @@ -156,8 +165,12 @@ static __cpuidle int intel_idle(struct cpuidle_device *dev,
> static __cpuidle int intel_idle_s2idle(struct cpuidle_device *dev,
> struct cpuidle_driver *drv, int index)
> {
> - unsigned long eax = flg2MWAIT(drv->states[index].flags);
> unsigned long ecx = 1; /* break on interrupt flag */
> + struct cpuidle_state *state = &drv->states[index];
> + unsigned long eax = flg2MWAIT(state->flags);
> +
> + if (state->flags & CPUIDLE_FLAG_INIT_XSTATE)
> + fpu_idle_fpregs();
>
> mwait_idle_with_hints(eax, ecx);
>
> @@ -792,7 +805,8 @@ static struct cpuidle_state spr_cstates[] __initdata = {
> {
> .name = "C6",
> .desc = "MWAIT 0x20",
> - .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED,
> + .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED | \
Why is the backslash at the end of the line needed?
> + CPUIDLE_FLAG_INIT_XSTATE,
> .exit_latency = 290,
> .target_residency = 800,
> .enter = &intel_idle,
> --
On 3/10/2022 10:34 AM, Rafael J. Wysocki wrote:
> On Wed, Mar 9, 2022 at 11:42 PM Chang S. Bae <[email protected]> wrote:
>>
[...]
>> @@ -792,7 +805,8 @@ static struct cpuidle_state spr_cstates[] __initdata = {
>> {
>> .name = "C6",
>> .desc = "MWAIT 0x20",
>> - .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED,
>> + .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED | \
>
> Why is the backslash at the end of the line needed?
No, it is not needed.
Sorry, I think I was mindlessly following the style in this new c-state
table:
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/tree/drivers/idle/intel_idle.c?h=linux-next#n787
Thanks,
Chang
On Thu, 2022-03-10 at 10:50 -0800, Chang S. Bae wrote:
> On 3/10/2022 10:34 AM, Rafael J. Wysocki wrote:
> > On Wed, Mar 9, 2022 at 11:42 PM Chang S. Bae <[email protected]>
> > wrote:
> > >
> [...]
> > > @@ -792,7 +805,8 @@ static struct cpuidle_state spr_cstates[] __initdata =
> > > {
> > > {
> > > .name = "C6",
> > > .desc = "MWAIT 0x20",
> > > - .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED,
> > > + .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED | \
> >
> > Why is the backslash at the end of the line needed?
>
> No, it is not needed.
>
> Sorry, I think I was mindlessly following the style in this new c-state
> table:
> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-
> pm.git/tree/drivers/idle/intel_idle.c?h=linux-next#n787
>
> Thanks,
> Chang
>
Sorry, too much python programming lately, so I automatically added the back-
slash. Let me know if you would remove that backslash at the same time, or I can
submit a patch.
Artem.
On 3/10/2022 11:33 PM, Artem Bityutskiy wrote:
>
> Let me know if you would remove that backslash at the same time, or I can
> submit a patch.
Looks like the table was merged without the backslash. Rafael thankfully
removed it via:
https://lore.kernel.org/linux-pm/4731792.31r3eYUQgx@kreacher/
Thanks,
Chang