LinuxLists.cc - [PATCH 0/6] add support for new persistent memory instructions

[permalink] [raw]

Subject: [PATCH 6/6] x86: Use clwb in drm_clflush_virt_range

If clwb is available on the system, use it in drm_clflush_virt_range.
If clwb is not available, fall back to clflushopt if you can.
If clflushopt is not supported, fall all the way back to clflush.

Signed-off-by: Ross Zwisler <[email protected]>
Cc: H Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: David Airlie <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
drivers/gpu/drm/drm_cache.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index aad9d82..84e9a04 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -138,8 +138,8 @@ drm_clflush_virt_range(void *addr, unsigned long length)
void *end = addr + length;
mb();
for (; addr < end; addr += boot_cpu_data.x86_clflush_size)
- clflushopt(addr);
- clflushopt(end - 1);
+ clwb(addr);
+ clwb(end - 1);
mb();
return;
}
--
1.9.3

2014-11-11 18:44:05

[permalink] [raw]

Subject: [PATCH 4/6] x86: Use clwb in clflush_cache_range

If clwb is available on the system, use it in clflush_cache_range.
If clwb is not available, fall back to clflushopt if you can.
If clflushopt is not supported, fall all the way back to clflush.

Signed-off-by: Ross Zwisler <[email protected]>
Cc: H Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: David Airlie <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
arch/x86/mm/pageattr.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 36de293..5229d45 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -126,8 +126,8 @@ within(unsigned long addr, unsigned long start, unsigned long end)
* @vaddr: virtual start address
* @size: number of bytes to flush
*
- * clflushopt is an unordered instruction which needs fencing with mfence or
- * sfence to avoid ordering issues.
+ * clflushopt and clwb are unordered instructions which need fencing with
+ * mfence or sfence to avoid ordering issues.
*/
void clflush_cache_range(void *vaddr, unsigned int size)
{
@@ -136,11 +136,11 @@ void clflush_cache_range(void *vaddr, unsigned int size)
mb();

for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
- clflushopt(vaddr);
+ clwb(vaddr);
/*
* Flush any possible final partial cacheline:
*/
- clflushopt(vend);
+ clwb(vend);

mb();
}
--
1.9.3

2014-11-11 18:44:02

[permalink] [raw]

Subject: [PATCH 3/6] x86: Add support for the clwb instruction

Add support for the new clwb instruction. This instruction was
announced in the document "Intel Architecture Instruction Set Extensions
Programming Reference" with reference number 319433-022.

https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

Here are some things of note:

- As with the clflushopt patches before this, I'm assuming that the addressing
mode generated by the original clflush instruction will match the new
clflush instruction with the 0x66 prefix for clflushopt, and for the
xsaveopt instruction with the 0x66 prefix for clwb. For all the test cases
that I've come up with and for the new clwb code generated by this patch
series, this has proven to be true on my test machine.

- According to the SDM, xsaveopt has a form where it has a REX.W prefix. I
believe that this prefix will not be generated by gcc in x86_64 kernel code.
Based on this, I don't believe I need to account for this extra prefix when
dealing with the assembly language created for clwb. Please correct me if
I'm wrong.

Signed-off-by: Ross Zwisler <[email protected]>
Cc: H Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: David Airlie <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
arch/x86/include/asm/cpufeature.h | 1 +
arch/x86/include/asm/special_insns.h | 10 ++++++++++
2 files changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index b3e6b89..fbbed34 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -227,6 +227,7 @@
#define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention */
#define X86_FEATURE_PCOMMIT ( 9*32+22) /* PCOMMIT instruction */
#define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */
+#define X86_FEATURE_CLWB ( 9*32+24) /* CLWB instruction */
#define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */
#define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and Reciprocal */
#define X86_FEATURE_AVX512CD ( 9*32+28) /* AVX-512 Conflict Detection */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 1709a2e..a328460 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -199,6 +199,16 @@ static inline void clflushopt(volatile void *__p)
"+m" (*(volatile char __force *)__p));
}

+static inline void clwb(volatile void *__p)
+{
+ alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",
+ ".byte 0x66; clflush %P0",
+ X86_FEATURE_CLFLUSHOPT,
+ ".byte 0x66; xsaveopt %P0",
+ X86_FEATURE_CLWB,
+ "+m" (*(volatile char __force *)__p));
+}
+
static inline void pcommit(void)
{
alternative(ASM_NOP4, ".byte 0x66, 0x0f, 0xae, 0xf8",
--
1.9.3

2014-11-11 18:45:09

[permalink] [raw]

Subject: [PATCH 2/6] x86/alternative: Add alternative_io_2

Add alternative_io_2 in the spirit of alternative_input_2 and
alternative_io. This will allow us to have instructions with an output
parameter that vary based on two CPU features.

Signed-off-by: Ross Zwisler <[email protected]>
Cc: H Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: David Airlie <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
arch/x86/include/asm/alternative.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 473bdbe..7d9ead9 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -180,6 +180,20 @@ static inline int alternatives_text_reserved(void *start, void *end)
asm volatile (ALTERNATIVE(oldinstr, newinstr, feature) \
: output : "i" (0), ## input)

+/*
+ * This is similar to alternative_io. But it has two features and
+ * respective instructions.
+ *
+ * If CPU has feature2, newinstr2 is used.
+ * Otherwise, if CPU has feature1, newinstr1 is used.
+ * Otherwise, oldinstr is used.
+ */
+#define alternative_io_2(oldinstr, newinstr1, feature1, newinstr2, \
+ feature2, output, input...) \
+ asm volatile(ALTERNATIVE_2(oldinstr, newinstr1, feature1, \
+ newinstr2, feature2) \
+ : output : "i" (0), ## input)
+
/* Like alternative_io, but for replacing a direct call with another one. */
#define alternative_call(oldfunc, newfunc, feature, output, input...) \
asm volatile (ALTERNATIVE("call %P[old]", "call %P[new]", feature) \
--
1.9.3

2014-11-11 18:45:08

[permalink] [raw]

Subject: [PATCH 5/6] x86: Use clwb in drm_clflush_page

If clwb is available on the system, use it in drm_clflush_page.
If clwb is not available, fall back to clflushopt if you can.
If clflushopt is not supported, fall all the way back to clflush.

Signed-off-by: Ross Zwisler <[email protected]>
Cc: H Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: David Airlie <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
drivers/gpu/drm/drm_cache.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index a6b6906..aad9d82 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -34,9 +34,9 @@
#if defined(CONFIG_X86)

/*
- * clflushopt is an unordered instruction which needs fencing with mfence or
- * sfence to avoid ordering issues. For drm_clflush_page this fencing happens
- * in the caller.
+ * clwb and clflushopt are unordered instructions which need fencing with
+ * mfence or sfence to avoid ordering issues. For drm_clflush_page this
+ * fencing happens in the caller.
*/
static void
drm_clflush_page(struct page *page)
@@ -50,7 +50,7 @@ drm_clflush_page(struct page *page)

page_virtual = kmap_atomic(page);
for (i = 0; i < PAGE_SIZE; i += size)
- clflushopt(page_virtual + i);
+ clwb(page_virtual + i);
kunmap_atomic(page_virtual);
}

--
1.9.3

2014-11-11 18:45:50

[permalink] [raw]

Subject: [PATCH 1/6] x86: Add support for the pcommit instruction

Add support for the new pcommit instruction. This instruction was
announced in the document "Intel Architecture Instruction Set Extensions
Programming Reference" with reference number 319433-022.

https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

Signed-off-by: Ross Zwisler <[email protected]>
Cc: H Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: David Airlie <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
arch/x86/include/asm/cpufeature.h | 1 +
arch/x86/include/asm/special_insns.h | 6 ++++++
2 files changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 0bb1335..b3e6b89 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -225,6 +225,7 @@
#define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */
#define X86_FEATURE_ADX ( 9*32+19) /* The ADCX and ADOX instructions */
#define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention */
+#define X86_FEATURE_PCOMMIT ( 9*32+22) /* PCOMMIT instruction */
#define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */
#define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */
#define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and Reciprocal */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index e820c08..1709a2e 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -199,6 +199,12 @@ static inline void clflushopt(volatile void *__p)
"+m" (*(volatile char __force *)__p));
}

+static inline void pcommit(void)
+{
+ alternative(ASM_NOP4, ".byte 0x66, 0x0f, 0xae, 0xf8",
+ X86_FEATURE_PCOMMIT);
+}
+
#define nop() asm volatile ("nop")

--
1.9.3

2014-11-11 19:12:47

[permalink] [raw]

Subject: Re: [PATCH 3/6] x86: Add support for the clwb instruction

On Tue, Nov 11, 2014 at 11:43:13AM -0700, Ross Zwisler wrote:
> Add support for the new clwb instruction. This instruction was
> announced in the document "Intel Architecture Instruction Set Extensions
> Programming Reference" with reference number 319433-022.
>
> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
>
> Here are some things of note:
>
> - As with the clflushopt patches before this, I'm assuming that the addressing
> mode generated by the original clflush instruction will match the new
> clflush instruction with the 0x66 prefix for clflushopt, and for the
> xsaveopt instruction with the 0x66 prefix for clwb. For all the test cases
> that I've come up with and for the new clwb code generated by this patch
> series, this has proven to be true on my test machine.
>
> - According to the SDM, xsaveopt has a form where it has a REX.W prefix. I
> believe that this prefix will not be generated by gcc in x86_64 kernel code.
> Based on this, I don't believe I need to account for this extra prefix when
> dealing with the assembly language created for clwb. Please correct me if
> I'm wrong.
>
> Signed-off-by: Ross Zwisler <[email protected]>
> Cc: H Peter Anvin <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: David Airlie <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---
> arch/x86/include/asm/cpufeature.h | 1 +
> arch/x86/include/asm/special_insns.h | 10 ++++++++++
> 2 files changed, 11 insertions(+)
>
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index b3e6b89..fbbed34 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -227,6 +227,7 @@
> #define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention */
> #define X86_FEATURE_PCOMMIT ( 9*32+22) /* PCOMMIT instruction */
> #define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */
> +#define X86_FEATURE_CLWB ( 9*32+24) /* CLWB instruction */
> #define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */
> #define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and Reciprocal */
> #define X86_FEATURE_AVX512CD ( 9*32+28) /* AVX-512 Conflict Detection */
> diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
> index 1709a2e..a328460 100644
> --- a/arch/x86/include/asm/special_insns.h
> +++ b/arch/x86/include/asm/special_insns.h
> @@ -199,6 +199,16 @@ static inline void clflushopt(volatile void *__p)
> "+m" (*(volatile char __force *)__p));
> }
>
> +static inline void clwb(volatile void *__p)
> +{
> + alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",

Any particular reason for using 0x3e as a prefix to have the insns be
the same size or is it simply because CLFLUSH can stomach it?

:-)

> + ".byte 0x66; clflush %P0",
> + X86_FEATURE_CLFLUSHOPT,
> + ".byte 0x66; xsaveopt %P0",

Huh, XSAVEOPT?!? Shouldn't that be CLWB??

> + X86_FEATURE_CLWB,
> + "+m" (*(volatile char __force *)__p));
> +}
> +
> static inline void pcommit(void)
> {
> alternative(ASM_NOP4, ".byte 0x66, 0x0f, 0xae, 0xf8",
> --
> 1.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-11-11 19:19:13

[permalink] [raw]

Subject: Re: [PATCH 3/6] x86: Add support for the clwb instruction

On Tue, Nov 11, 2014 at 08:12:39PM +0100, Borislav Petkov wrote:
> > + ".byte 0x66; xsaveopt %P0",
>
> Huh, XSAVEOPT?!? Shouldn't that be CLWB??

Bah, the same opcodes, only 0x66 prefix makes it into CLWB. Could use a
comment I guess.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-11-11 19:40:08

[permalink] [raw]

Subject: Re: [PATCH 3/6] x86: Add support for the clwb instruction

On Tue, 2014-11-11 at 20:19 +0100, Borislav Petkov wrote:
> On Tue, Nov 11, 2014 at 08:12:39PM +0100, Borislav Petkov wrote:
> > > + ".byte 0x66; xsaveopt %P0",
> >
> > Huh, XSAVEOPT?!? Shouldn't that be CLWB??
>
> Bah, the same opcodes, only 0x66 prefix makes it into CLWB. Could use a
> comment I guess.

Yep, it's weird, I know. :) I'll add a comment.

Thanks,
- Ross

2014-11-11 19:46:28

[permalink] [raw]

Subject: Re: [PATCH 3/6] x86: Add support for the clwb instruction

On Tue, Nov 11, 2014 at 12:40:00PM -0700, Ross Zwisler wrote:
> Yep, it's weird, I know. :)

But sure, saving opcode space, makes sense to me.

Btw, I'd still be interested about this:

> +static inline void clwb(volatile void *__p)
> +{
> + alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",

Any particular reason for using 0x3e as a prefix to have the insns be
the same size or is it simply because CLFLUSH can stomach it?

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-11-11 19:49:21

[permalink] [raw]

Subject: Re: [PATCH 3/6] x86: Add support for the clwb instruction

On Tue, 2014-11-11 at 20:12 +0100, Borislav Petkov wrote:
> On Tue, Nov 11, 2014 at 11:43:13AM -0700, Ross Zwisler wrote:
> > Add support for the new clwb instruction. This instruction was
> > announced in the document "Intel Architecture Instruction Set Extensions
> > Programming Reference" with reference number 319433-022.
> >
> > https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
> >
> > Here are some things of note:
> >
> > - As with the clflushopt patches before this, I'm assuming that the addressing
> > mode generated by the original clflush instruction will match the new
> > clflush instruction with the 0x66 prefix for clflushopt, and for the
> > xsaveopt instruction with the 0x66 prefix for clwb. For all the test cases
> > that I've come up with and for the new clwb code generated by this patch
> > series, this has proven to be true on my test machine.
> >
> > - According to the SDM, xsaveopt has a form where it has a REX.W prefix. I
> > believe that this prefix will not be generated by gcc in x86_64 kernel code.
> > Based on this, I don't believe I need to account for this extra prefix when
> > dealing with the assembly language created for clwb. Please correct me if
> > I'm wrong.
> >
> > Signed-off-by: Ross Zwisler <[email protected]>
> > Cc: H Peter Anvin <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: David Airlie <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > ---
> > arch/x86/include/asm/cpufeature.h | 1 +
> > arch/x86/include/asm/special_insns.h | 10 ++++++++++
> > 2 files changed, 11 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> > index b3e6b89..fbbed34 100644
> > --- a/arch/x86/include/asm/cpufeature.h
> > +++ b/arch/x86/include/asm/cpufeature.h
> > @@ -227,6 +227,7 @@
> > #define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention */
> > #define X86_FEATURE_PCOMMIT ( 9*32+22) /* PCOMMIT instruction */
> > #define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */
> > +#define X86_FEATURE_CLWB ( 9*32+24) /* CLWB instruction */
> > #define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */
> > #define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and Reciprocal */
> > #define X86_FEATURE_AVX512CD ( 9*32+28) /* AVX-512 Conflict Detection */
> > diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
> > index 1709a2e..a328460 100644
> > --- a/arch/x86/include/asm/special_insns.h
> > +++ b/arch/x86/include/asm/special_insns.h
> > @@ -199,6 +199,16 @@ static inline void clflushopt(volatile void *__p)
> > "+m" (*(volatile char __force *)__p));
> > }
> >
> > +static inline void clwb(volatile void *__p)
> > +{
> > + alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",
>
> Any particular reason for using 0x3e as a prefix to have the insns be
> the same size or is it simply because CLFLUSH can stomach it?
>
> :-)

Essentially we need one additional byte at the beginning of the clflush so
that we can flip it into a clflushopt by changing that byte into a 0x66
prefix. Two options are to either insert a 1 byte ASM_NOP1, or to add a 1
byte NOP_DS_PREFIX. Both have no functional effect with the plain clflush,
but I've been told that executing a clflush + prefix should be faster than
executing a clflush + NOP.

2014-11-11 19:54:45

[permalink] [raw]

Subject: Re: [PATCH 3/6] x86: Add support for the clwb instruction

On Tue, Nov 11, 2014 at 12:48:52PM -0700, Ross Zwisler wrote:
> Essentially we need one additional byte at the beginning of the clflush so
> that we can flip it into a clflushopt by changing that byte into a 0x66
> prefix. Two options are to either insert a 1 byte ASM_NOP1, or to add a 1
> byte NOP_DS_PREFIX. Both have no functional effect with the plain clflush,
> but I've been told that executing a clflush + prefix should be faster than
> executing a clflush + NOP.

I see. :)

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-11-11 19:55:10

[permalink] [raw]

Subject: Re: [PATCH 3/6] x86: Add support for the clwb instruction

On Tue, 2014-11-11 at 20:46 +0100, Borislav Petkov wrote:
> On Tue, Nov 11, 2014 at 12:40:00PM -0700, Ross Zwisler wrote:
> > Yep, it's weird, I know. :)
>
> But sure, saving opcode space, makes sense to me.
>
> Btw, I'd still be interested about this:
>
> > +static inline void clwb(volatile void *__p)
> > +{
> > + alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",
>
> Any particular reason for using 0x3e as a prefix to have the insns be
> the same size or is it simply because CLFLUSH can stomach it?

Ah, sorry, I was still responding to your first mail. :) Response
copied here to save searching:

Essentially we need one additional byte at the beginning of the
clflush so that we can flip it into a clflushopt by changing that byte
into a 0x66 prefix. Two options are to either insert a 1 byte
ASM_NOP1, or to add a 1 byte NOP_DS_PREFIX. Both have no functional
effect with the plain clflush, but I've been told that executing a
clflush + prefix should be faster than executing a clflush + NOP.

I agree, this is useful info - I'll add it to the patch comments for v2.

Thank you for the feedback.

- Ross

2014-11-12 12:39:17

[permalink] [raw]

Subject: Re: [PATCH 3/6] x86: Add support for the clwb instruction

On Tue, Nov 11, 2014 at 11:43:13AM -0700, Ross Zwisler wrote:
> +static inline void clwb(volatile void *__p)
> +{
> + alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",
> + ".byte 0x66; clflush %P0",
> + X86_FEATURE_CLFLUSHOPT,
> + ".byte 0x66; xsaveopt %P0",

Btw, I'm afraid you're going to have to spell out those new instruction
mnemonics as they aren't supported by older compilers:

CC drivers/gpu/drm/drm_cache.o
{standard input}: Assembler messages:
{standard input}:63: Error: no such instruction: `xsaveopt (%rax)'
{standard input}:249: Error: no such instruction: `xsaveopt (%rdi)'
{standard input}:283: Error: no such instruction: `xsaveopt -1(%rdx)'
make[1]: *** [drivers/gpu/drm/drm_cache.o] Error 1
make: *** [drivers/gpu/drm/drm_cache.o] Error 2
[boris@etch:14:35:40:k:14)-> gcc --version
gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)

This is on a very old guest I have which has this gcc in it and the
kernel supports everything gcc >= 3.2.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-11-12 13:38:52

by H. Peter Anvin

[permalink] [raw]

Subject: RE: [PATCH 3/6] x86: Add support for the clwb instruction

No, it doesn't. x86 requires 3.4+ at a minimum.

________________________________
From: Borislav Petkov
Sent: Wednesday, November 12, 2014 4:39:09 AM
To: Ross Zwisler
Cc: [email protected]; Anvin, H Peter; Ingo Molnar; Thomas Gleixner; David Airlie; [email protected]; [email protected]
Subject: Re: [PATCH 3/6] x86: Add support for the clwb instruction

On Tue, Nov 11, 2014 at 11:43:13AM -0700, Ross Zwisler wrote:
> +static inline void clwb(volatile void *__p)
> +{
> + alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",
> + ".byte 0x66; clflush %P0",
> + X86_FEATURE_CLFLUSHOPT,
> + ".byte 0x66; xsaveopt %P0",

Btw, I'm afraid you're going to have to spell out those new instruction
mnemonics as they aren't supported by older compilers:

CC drivers/gpu/drm/drm_cache.o
{standard input}: Assembler messages:
{standard input}:63: Error: no such instruction: `xsaveopt (%rax)'
{standard input}:249: Error: no such instruction: `xsaveopt (%rdi)'
{standard input}:283: Error: no such instruction: `xsaveopt -1(%rdx)'
make[1]: *** [drivers/gpu/drm/drm_cache.o] Error 1
make: *** [drivers/gpu/drm/drm_cache.o] Error 2
[boris@etch:14:35:40:k:14)-> gcc --version
gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)

This is on a very old guest I have which has this gcc in it and the
kernel supports everything gcc >= 3.2.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-11-12 14:12:58

[permalink] [raw]

Subject: Re: [PATCH 3/6] x86: Add support for the clwb instruction

On Wed, Nov 12, 2014 at 01:38:45PM +0000, Anvin, H Peter wrote:
> No, it doesn't. x86 requires 3.4+ at a minimum.

The only test I see is:

#if GCC_VERSION < 30200
# error Sorry, your compiler is too old - please upgrade it.
#endif

And even if we do require 3.4, the build fails with 4.1+ so...

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-11-13 03:14:26

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86: Use clwb in drm_clflush_virt_range

On 11/11/2014 10:43 AM, Ross Zwisler wrote:
> If clwb is available on the system, use it in drm_clflush_virt_range.
> If clwb is not available, fall back to clflushopt if you can.
> If clflushopt is not supported, fall all the way back to clflush.

I don't know exactly what drm_clflush_virt_range (and the other
functions you're modifying similarly) are for, but it seems plausible to
me that they're used before reads to make sure that non-coherent memory
sees updated data. If that's true, then this will break it.

But maybe all the users are write to coherent memory that just need to
ensure that whatever's backing the memory knows about the write.

FWIW, it may make sense to rename this function to drm_clwb_virt_range
if you make this change.

--Andy

>
> Signed-off-by: Ross Zwisler <[email protected]>
> Cc: H Peter Anvin <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: David Airlie <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---
> drivers/gpu/drm/drm_cache.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> index aad9d82..84e9a04 100644
> --- a/drivers/gpu/drm/drm_cache.c
> +++ b/drivers/gpu/drm/drm_cache.c
> @@ -138,8 +138,8 @@ drm_clflush_virt_range(void *addr, unsigned long length)
> void *end = addr + length;
> mb();
> for (; addr < end; addr += boot_cpu_data.x86_clflush_size)
> - clflushopt(addr);
> - clflushopt(end - 1);
> + clwb(addr);
> + clwb(end - 1);
> mb();
> return;
> }
>

2014-11-13 03:25:42

[permalink] [raw]

Subject: Re: [PATCH 1/6] x86: Add support for the pcommit instruction

On 11/11/2014 10:43 AM, Ross Zwisler wrote:
> Add support for the new pcommit instruction. This instruction was
> announced in the document "Intel Architecture Instruction Set Extensions
> Programming Reference" with reference number 319433-022.
>
> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
>
> Signed-off-by: Ross Zwisler <[email protected]>
> Cc: H Peter Anvin <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: David Airlie <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---
> arch/x86/include/asm/cpufeature.h | 1 +
> arch/x86/include/asm/special_insns.h | 6 ++++++
> 2 files changed, 7 insertions(+)
>
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index 0bb1335..b3e6b89 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -225,6 +225,7 @@
> #define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */
> #define X86_FEATURE_ADX ( 9*32+19) /* The ADCX and ADOX instructions */
> #define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention */
> +#define X86_FEATURE_PCOMMIT ( 9*32+22) /* PCOMMIT instruction */
> #define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */
> #define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */
> #define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and Reciprocal */
> diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
> index e820c08..1709a2e 100644
> --- a/arch/x86/include/asm/special_insns.h
> +++ b/arch/x86/include/asm/special_insns.h
> @@ -199,6 +199,12 @@ static inline void clflushopt(volatile void *__p)
> "+m" (*(volatile char __force *)__p));
> }
>
> +static inline void pcommit(void)
> +{
> + alternative(ASM_NOP4, ".byte 0x66, 0x0f, 0xae, 0xf8",
> + X86_FEATURE_PCOMMIT);
> +}
> +

Should this patch add the feature bit and cpuinfo entry to go with it?

--Andy

2014-11-13 11:20:30

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86: Use clwb in drm_clflush_virt_range

On Wed, Nov 12, 2014 at 07:14:21PM -0800, Andy Lutomirski wrote:
> On 11/11/2014 10:43 AM, Ross Zwisler wrote:
> > If clwb is available on the system, use it in drm_clflush_virt_range.
> > If clwb is not available, fall back to clflushopt if you can.
> > If clflushopt is not supported, fall all the way back to clflush.
>
> I don't know exactly what drm_clflush_virt_range (and the other
> functions you're modifying similarly) are for, but it seems plausible to
> me that they're used before reads to make sure that non-coherent memory
> sees updated data. If that's true, then this will break it.

Why would it break it? The updated cachelines will be in memory and
subsequent reads will be serviced from the cache instead from going to
memory as it is not invalidated as it would be by CLFLUSH.

/me is puzzled.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-11-13 16:38:47

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86: Use clwb in drm_clflush_virt_range

On Nov 13, 2014 3:20 AM, "Borislav Petkov" <[email protected]> wrote:
>
> On Wed, Nov 12, 2014 at 07:14:21PM -0800, Andy Lutomirski wrote:
> > On 11/11/2014 10:43 AM, Ross Zwisler wrote:
> > > If clwb is available on the system, use it in drm_clflush_virt_range.
> > > If clwb is not available, fall back to clflushopt if you can.
> > > If clflushopt is not supported, fall all the way back to clflush.
> >
> > I don't know exactly what drm_clflush_virt_range (and the other
> > functions you're modifying similarly) are for, but it seems plausible to
> > me that they're used before reads to make sure that non-coherent memory
> > sees updated data. If that's true, then this will break it.
>
> Why would it break it? The updated cachelines will be in memory and
> subsequent reads will be serviced from the cache instead from going to
> memory as it is not invalidated as it would be by CLFLUSH.
>
> /me is puzzled.

Suppose you map some device memory WB, and then the device
non-coherently updates. If you want the CPU to see it, you need
clflush or clflushopt. Some architectures might do this for
dma_sync_single_for_cpu with DMA_FROM_DEVICE.

I'm not sure that such a thing exists on x86.

--Andy

2014-11-13 17:11:44

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86: Use clwb in drm_clflush_virt_range

On Thu, Nov 13, 2014 at 08:38:23AM -0800, Andy Lutomirski wrote:
> On Nov 13, 2014 3:20 AM, "Borislav Petkov" <[email protected]> wrote:
> >
> > On Wed, Nov 12, 2014 at 07:14:21PM -0800, Andy Lutomirski wrote:
> > > On 11/11/2014 10:43 AM, Ross Zwisler wrote:
> > > > If clwb is available on the system, use it in drm_clflush_virt_range.
> > > > If clwb is not available, fall back to clflushopt if you can.
> > > > If clflushopt is not supported, fall all the way back to clflush.
> > >
> > > I don't know exactly what drm_clflush_virt_range (and the other
> > > functions you're modifying similarly) are for, but it seems plausible to
> > > me that they're used before reads to make sure that non-coherent memory
> > > sees updated data. If that's true, then this will break it.
> >
> > Why would it break it? The updated cachelines will be in memory and
> > subsequent reads will be serviced from the cache instead from going to
> > memory as it is not invalidated as it would be by CLFLUSH.
> >
> > /me is puzzled.
>
> Suppose you map some device memory WB, and then the device
> non-coherently updates. If you want the CPU to see it, you need
> clflush or clflushopt. Some architectures might do this for
> dma_sync_single_for_cpu with DMA_FROM_DEVICE.

Ah, you're talking about the other way around - the device does the
writes. Well, the usage sites are all in i915*, maybe we should ask
them - it looks to me like this is only the CPU making stuff visible in
the shared buffer but I don't know that code... intel-gfx CCed although
dri-devel is already on CC.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-11-13 17:38:56

by Ville Syrjälä

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86: Use clwb in drm_clflush_virt_range

On Thu, Nov 13, 2014 at 06:11:33PM +0100, Borislav Petkov wrote:
> On Thu, Nov 13, 2014 at 08:38:23AM -0800, Andy Lutomirski wrote:
> > On Nov 13, 2014 3:20 AM, "Borislav Petkov" <[email protected]> wrote:
> > >
> > > On Wed, Nov 12, 2014 at 07:14:21PM -0800, Andy Lutomirski wrote:
> > > > On 11/11/2014 10:43 AM, Ross Zwisler wrote:
> > > > > If clwb is available on the system, use it in drm_clflush_virt_range.
> > > > > If clwb is not available, fall back to clflushopt if you can.
> > > > > If clflushopt is not supported, fall all the way back to clflush.
> > > >
> > > > I don't know exactly what drm_clflush_virt_range (and the other
> > > > functions you're modifying similarly) are for, but it seems plausible to
> > > > me that they're used before reads to make sure that non-coherent memory
> > > > sees updated data. If that's true, then this will break it.
> > >
> > > Why would it break it? The updated cachelines will be in memory and
> > > subsequent reads will be serviced from the cache instead from going to
> > > memory as it is not invalidated as it would be by CLFLUSH.
> > >
> > > /me is puzzled.
> >
> > Suppose you map some device memory WB, and then the device
> > non-coherently updates. If you want the CPU to see it, you need
> > clflush or clflushopt. Some architectures might do this for
> > dma_sync_single_for_cpu with DMA_FROM_DEVICE.
>
> Ah, you're talking about the other way around - the device does the
> writes. Well, the usage sites are all in i915*, maybe we should ask
> them - it looks to me like this is only the CPU making stuff visible in
> the shared buffer but I don't know that code... intel-gfx CCed although
> dri-devel is already on CC.

We use it both ways in i915. So please don't break it.

--
Ville Syrj?l?
Intel OTC

2014-11-13 17:47:44

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86: Use clwb in drm_clflush_virt_range

On Thu, Nov 13, 2014 at 07:33:54PM +0200, Ville Syrjälä wrote:
> We use it both ways in i915. So please don't break it.

Haha, we started from Intel with Ross' patch and made a full circle
back. Maybe you guys should talk about it.

LOL. :-)

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-11-13 17:48:00

[permalink] [raw]

Subject: Re: [PATCH 3/6] x86: Add support for the clwb instruction

On Wed, 2014-11-12 at 15:12 +0100, Borislav Petkov wrote:
> On Wed, Nov 12, 2014 at 01:38:45PM +0000, Anvin, H Peter wrote:
> > No, it doesn't. x86 requires 3.4+ at a minimum.
>
> The only test I see is:
>
> #if GCC_VERSION < 30200
> # error Sorry, your compiler is too old - please upgrade it.
> #endif
>
> And even if we do require 3.4, the build fails with 4.1+ so...

Ah, dang, you're right. Okay, I'll figure out how to do this without
using xsaveopt.

Thank you for pointing this out.

- Ross

2014-11-13 18:14:18

by Ville Syrjälä

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86: Use clwb in drm_clflush_virt_range

On Thu, Nov 13, 2014 at 06:47:34PM +0100, Borislav Petkov wrote:
> On Thu, Nov 13, 2014 at 07:33:54PM +0200, Ville Syrj?l? wrote:
> > We use it both ways in i915. So please don't break it.
>
> Haha, we started from Intel with Ross' patch and made a full circle
> back. Maybe you guys should talk about it.
>
> LOL. :-)

Indeed. The problem I see with these patches is that they don't actually
tell what the new instruction does, so a casual glance doesn't really
raise any red flags. Another excuse I can use is that I just got used
to the fact that the x86 hasn't historically bothered separating
invalidate and writeback and just does both. In my previous life on the
dark^Warm side I did actually know the difference :)

But there's plenty of blame to go around to the other side of the fence
too. We should have documented what we expect from these functions.
Currently you just have to know/guess, and that's just not good enough.

--
Ville Syrj?l?
Intel OTC

2014-11-13 18:44:04

by Ville Syrjälä

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86: Use clwb in drm_clflush_virt_range

On Thu, Nov 13, 2014 at 07:33:54PM +0200, Ville Syrj?l? wrote:
> On Thu, Nov 13, 2014 at 06:11:33PM +0100, Borislav Petkov wrote:
> > On Thu, Nov 13, 2014 at 08:38:23AM -0800, Andy Lutomirski wrote:
> > > On Nov 13, 2014 3:20 AM, "Borislav Petkov" <[email protected]> wrote:
> > > >
> > > > On Wed, Nov 12, 2014 at 07:14:21PM -0800, Andy Lutomirski wrote:
> > > > > On 11/11/2014 10:43 AM, Ross Zwisler wrote:
> > > > > > If clwb is available on the system, use it in drm_clflush_virt_range.
> > > > > > If clwb is not available, fall back to clflushopt if you can.
> > > > > > If clflushopt is not supported, fall all the way back to clflush.
> > > > >
> > > > > I don't know exactly what drm_clflush_virt_range (and the other
> > > > > functions you're modifying similarly) are for, but it seems plausible to
> > > > > me that they're used before reads to make sure that non-coherent memory
> > > > > sees updated data. If that's true, then this will break it.
> > > >
> > > > Why would it break it? The updated cachelines will be in memory and
> > > > subsequent reads will be serviced from the cache instead from going to
> > > > memory as it is not invalidated as it would be by CLFLUSH.
> > > >
> > > > /me is puzzled.
> > >
> > > Suppose you map some device memory WB, and then the device
> > > non-coherently updates. If you want the CPU to see it, you need
> > > clflush or clflushopt. Some architectures might do this for
> > > dma_sync_single_for_cpu with DMA_FROM_DEVICE.
> >
> > Ah, you're talking about the other way around - the device does the
> > writes. Well, the usage sites are all in i915*, maybe we should ask
> > them - it looks to me like this is only the CPU making stuff visible in
> > the shared buffer but I don't know that code... intel-gfx CCed although
> > dri-devel is already on CC.
>
> We use it both ways in i915. So please don't break it.

Actually to clarify a bit, I think we shouldn't actually need the
invalidate part for any modern platform with shared LLC. Apart from the
display those tend to be fully coherent, so we only have to care about
the display controller not seeing stale data in memory.

I have no idea which platforms support this instruction. If
Baytrail/Braswell/Cherrytrail are on the list, then we have a problem.
Otherwise we should probably be fine with it. But I'm not 100% sure
about any future platforms that are still under wraps.

Also ttm seems to use some of this stuff. Not sure what they expect
from it.

--
Ville Syrj?l?
Intel OTC

2014-11-14 21:07:39

[permalink] [raw]

Subject: Re: [PATCH 1/6] x86: Add support for the pcommit instruction

On Wed, 2014-11-12 at 19:25 -0800, Andy Lutomirski wrote:
> On 11/11/2014 10:43 AM, Ross Zwisler wrote:
> > Add support for the new pcommit instruction. This instruction was
> > announced in the document "Intel Architecture Instruction Set Extensions
> > Programming Reference" with reference number 319433-022.
> >
> > https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
> >
> > Signed-off-by: Ross Zwisler <[email protected]>
> > Cc: H Peter Anvin <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: David Airlie <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > ---
> > arch/x86/include/asm/cpufeature.h | 1 +
> > arch/x86/include/asm/special_insns.h | 6 ++++++
> > 2 files changed, 7 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> > index 0bb1335..b3e6b89 100644
> > --- a/arch/x86/include/asm/cpufeature.h
> > +++ b/arch/x86/include/asm/cpufeature.h
> > @@ -225,6 +225,7 @@
> > #define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */
> > #define X86_FEATURE_ADX ( 9*32+19) /* The ADCX and ADOX instructions */
> > #define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention */
> > +#define X86_FEATURE_PCOMMIT ( 9*32+22) /* PCOMMIT instruction */
> > #define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */
> > #define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */
> > #define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and Reciprocal */
> > diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
> > index e820c08..1709a2e 100644
> > --- a/arch/x86/include/asm/special_insns.h
> > +++ b/arch/x86/include/asm/special_insns.h
> > @@ -199,6 +199,12 @@ static inline void clflushopt(volatile void *__p)
> > "+m" (*(volatile char __force *)__p));
> > }
> >
> > +static inline void pcommit(void)
> > +{
> > + alternative(ASM_NOP4, ".byte 0x66, 0x0f, 0xae, 0xf8",
> > + X86_FEATURE_PCOMMIT);
> > +}
> > +
>
> Should this patch add the feature bit and cpuinfo entry to go with it?
>
> --Andy

I think this patch does everything we need? The text for cpuinfo is
auto-generated in arch/x86/kernel/cpu/capflags.c from the flags defined
in arch/x86/include/asm/cpufeature.h, I think. Here's what I get in
cpuinfo on my system with a faked-out CPUID saying that clwb and pcommit
are present:

$ grep 'flags' /proc/cpuinfo
flags : fpu <snip> erms pcommit clflushopt clwb xsaveopt

The X86_FEATURE_CLWB and X86_FEATURE_PCOMMIT flags are being set up
according to what's in CPUID, and the proper alternatives are being
triggered. I stuck some debug code in the alternatives code to see what
was being patched in the presence and absence of each of the flags.

Is there something else I'm missing?

Thanks,
- Ross

2014-11-14 21:09:26