2016-11-08 08:53:36

by He Chen

[permalink] [raw]
Subject: [PATCH v4 0/2] cpuid: Support AVX512_4VNNIW and AVX512_4FMAPS for KVM guest

This patch series is going to add two new AVX512 features to KVM guest.
Since these two features are defined as scattered features in kernel,
some extra modification in kernel is included.

---
Changes in v4:
* divide patch into 2 parts, including modification in scattered.c and
support new AVX512 instructions for KVM.
* coding style.
* refine commit message.

Changes in v3:
* add a helper in scattered.c to get scattered leaf.

Changes in v2:
* add new macros for new AVX512 scattered features.
* add a cpuid_count_edx function to processor.h

He Chen (2):
cpuid: Add a helper in scattered.c to return cpuid leaf info
cpuid: add AVX512_4VNNIW and AVX512_4FMAPS instructions support

arch/x86/events/intel/pt.c | 45 ++++++++++++++------------------
arch/x86/include/asm/processor.h | 14 ++++++++++
arch/x86/kernel/cpu/scattered.c | 56 ++++++++++++++++++++++++++--------------
arch/x86/kernel/cpuid.c | 4 ---
arch/x86/kvm/cpuid.c | 14 +++++++++-
5 files changed, 83 insertions(+), 50 deletions(-)

--
2.7.4


2016-11-08 08:53:50

by He Chen

[permalink] [raw]
Subject: [PATCH v4 2/2] cpuid: add AVX512_4VNNIW and AVX512_4FMAPS instructions support

Add two new AVX512 instructions support for KVM guest.

AVX512_4VNNIW:
Vector instructions for deep learning enhanced word variable precision.

AVX512_4FMAPS:
Vector instructions for deep learning floating-point single precision.
---
arch/x86/kvm/cpuid.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index afa7bbb..ddcdf7c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -16,6 +16,7 @@
#include <linux/export.h>
#include <linux/vmalloc.h>
#include <linux/uaccess.h>
+#include <asm/processor.h>
#include <asm/fpu/internal.h> /* For use_eager_fpu. Ugh! */
#include <asm/user.h>
#include <asm/fpu/xstate.h>
@@ -65,6 +66,11 @@ u64 kvm_supported_xcr0(void)

#define F(x) bit(X86_FEATURE_##x)

+/* These are scattered features in cpufeatures.h. */
+#define KVM_CPUID_BIT_AVX512_4VNNIW 2
+#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+#define KF(x) bit(KVM_CPUID_BIT_##x)
+
int kvm_update_cpuid(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;
@@ -376,6 +382,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
/* cpuid 7.0.ecx*/
const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;

+ /* cpuid 7.0.edx*/
+ const u32 kvm_cpuid_7_0_edx_x86_features =
+ KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();

@@ -458,12 +468,14 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
/* PKU is not yet implemented for shadow paging. */
if (!tdp_enabled)
entry->ecx &= ~F(PKU);
+ entry->edx &= kvm_cpuid_7_0_edx_x86_features;
+ entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
} else {
entry->ebx = 0;
entry->ecx = 0;
+ entry->edx = 0;
}
entry->eax = 0;
- entry->edx = 0;
break;
}
case 9:
--
2.7.4

2016-11-08 08:54:13

by He Chen

[permalink] [raw]
Subject: [PATCH v4 1/2] cpuid: Add a helper in scattered.c to return cpuid leaf info

Some sparse cpuid leafs are gathered in a fake leaf to save size of
x86_capability array in current code, but sometimes, kernel or other
modules (e.g. KVM cpuid enumeration) may need actual hardware leaf
information.

This patch adds a helper get_scattered_cpuid_leaf to rebuild actual
cpuid leaf, and it can be called outside by modules. Also, export
enum cpuid_regs in pt.c and scattered.c to enum cpuid_regs_idx in
processor.h.
---
arch/x86/events/intel/pt.c | 45 ++++++++++++++------------------
arch/x86/include/asm/processor.h | 14 ++++++++++
arch/x86/kernel/cpu/scattered.c | 56 ++++++++++++++++++++++++++--------------
arch/x86/kernel/cpuid.c | 4 ---
4 files changed, 70 insertions(+), 49 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index c5047b8..1c1b9fe 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -36,13 +36,6 @@ static DEFINE_PER_CPU(struct pt, pt_ctx);

static struct pt_pmu pt_pmu;

-enum cpuid_regs {
- CR_EAX = 0,
- CR_ECX,
- CR_EDX,
- CR_EBX
-};
-
/*
* Capabilities of Intel PT hardware, such as number of address bits or
* supported output schemes, are cached and exported to userspace as "caps"
@@ -64,21 +57,21 @@ static struct pt_cap_desc {
u8 reg;
u32 mask;
} pt_caps[] = {
- PT_CAP(max_subleaf, 0, CR_EAX, 0xffffffff),
- PT_CAP(cr3_filtering, 0, CR_EBX, BIT(0)),
- PT_CAP(psb_cyc, 0, CR_EBX, BIT(1)),
- PT_CAP(ip_filtering, 0, CR_EBX, BIT(2)),
- PT_CAP(mtc, 0, CR_EBX, BIT(3)),
- PT_CAP(ptwrite, 0, CR_EBX, BIT(4)),
- PT_CAP(power_event_trace, 0, CR_EBX, BIT(5)),
- PT_CAP(topa_output, 0, CR_ECX, BIT(0)),
- PT_CAP(topa_multiple_entries, 0, CR_ECX, BIT(1)),
- PT_CAP(single_range_output, 0, CR_ECX, BIT(2)),
- PT_CAP(payloads_lip, 0, CR_ECX, BIT(31)),
- PT_CAP(num_address_ranges, 1, CR_EAX, 0x3),
- PT_CAP(mtc_periods, 1, CR_EAX, 0xffff0000),
- PT_CAP(cycle_thresholds, 1, CR_EBX, 0xffff),
- PT_CAP(psb_periods, 1, CR_EBX, 0xffff0000),
+ PT_CAP(max_subleaf, 0, CPUID_EAX, 0xffffffff),
+ PT_CAP(cr3_filtering, 0, CPUID_EBX, BIT(0)),
+ PT_CAP(psb_cyc, 0, CPUID_EBX, BIT(1)),
+ PT_CAP(ip_filtering, 0, CPUID_EBX, BIT(2)),
+ PT_CAP(mtc, 0, CPUID_EBX, BIT(3)),
+ PT_CAP(ptwrite, 0, CPUID_EBX, BIT(4)),
+ PT_CAP(power_event_trace, 0, CPUID_EBX, BIT(5)),
+ PT_CAP(topa_output, 0, CPUID_ECX, BIT(0)),
+ PT_CAP(topa_multiple_entries, 0, CPUID_ECX, BIT(1)),
+ PT_CAP(single_range_output, 0, CPUID_ECX, BIT(2)),
+ PT_CAP(payloads_lip, 0, CPUID_ECX, BIT(31)),
+ PT_CAP(num_address_ranges, 1, CPUID_EAX, 0x3),
+ PT_CAP(mtc_periods, 1, CPUID_EAX, 0xffff0000),
+ PT_CAP(cycle_thresholds, 1, CPUID_EBX, 0xffff),
+ PT_CAP(psb_periods, 1, CPUID_EBX, 0xffff0000),
};

static u32 pt_cap_get(enum pt_capabilities cap)
@@ -213,10 +206,10 @@ static int __init pt_pmu_hw_init(void)

for (i = 0; i < PT_CPUID_LEAVES; i++) {
cpuid_count(20, i,
- &pt_pmu.caps[CR_EAX + i*PT_CPUID_REGS_NUM],
- &pt_pmu.caps[CR_EBX + i*PT_CPUID_REGS_NUM],
- &pt_pmu.caps[CR_ECX + i*PT_CPUID_REGS_NUM],
- &pt_pmu.caps[CR_EDX + i*PT_CPUID_REGS_NUM]);
+ &pt_pmu.caps[CPUID_EAX + i*PT_CPUID_REGS_NUM],
+ &pt_pmu.caps[CPUID_EBX + i*PT_CPUID_REGS_NUM],
+ &pt_pmu.caps[CPUID_ECX + i*PT_CPUID_REGS_NUM],
+ &pt_pmu.caps[CPUID_EDX + i*PT_CPUID_REGS_NUM]);
}

ret = -ENOMEM;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..e7f8c62 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -137,6 +137,17 @@ struct cpuinfo_x86 {
u32 microcode;
};

+struct cpuid_regs {
+ u32 eax, ebx, ecx, edx;
+};
+
+enum cpuid_regs_idx {
+ CPUID_EAX = 0,
+ CPUID_EBX,
+ CPUID_ECX,
+ CPUID_EDX,
+};
+
#define X86_VENDOR_INTEL 0
#define X86_VENDOR_CYRIX 1
#define X86_VENDOR_AMD 2
@@ -178,6 +189,9 @@ extern void identify_secondary_cpu(struct cpuinfo_x86 *);
extern void print_cpu_info(struct cpuinfo_x86 *);
void print_cpu_msr(struct cpuinfo_x86 *);
extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
+extern u32 get_scattered_cpuid_leaf(unsigned int level,
+ unsigned int sub_leaf,
+ enum cpuid_regs_idx reg);
extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
extern void init_amd_cacheinfo(struct cpuinfo_x86 *c);

diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 1db8dc4..ef131ea 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -17,11 +17,17 @@ struct cpuid_bit {
u32 sub_leaf;
};

-enum cpuid_regs {
- CR_EAX = 0,
- CR_ECX,
- CR_EDX,
- CR_EBX
+/* Please keep the leaf sorted by cpuid_bit.level for faster search. */
+static const struct cpuid_bit cpuid_bits[] = {
+ { X86_FEATURE_APERFMPERF, CPUID_ECX, 0, 0x00000006, 0 },
+ { X86_FEATURE_EPB, CPUID_ECX, 3, 0x00000006, 0 },
+ { X86_FEATURE_INTEL_PT, CPUID_EBX, 25, 0x00000007, 0 },
+ { X86_FEATURE_AVX512_4VNNIW, CPUID_EDX, 2, 0x00000007, 0 },
+ { X86_FEATURE_AVX512_4FMAPS, CPUID_EDX, 3, 0x00000007, 0 },
+ { X86_FEATURE_HW_PSTATE, CPUID_EDX, 7, 0x80000007, 0 },
+ { X86_FEATURE_CPB, CPUID_EDX, 9, 0x80000007, 0 },
+ { X86_FEATURE_PROC_FEEDBACK, CPUID_EDX, 11, 0x80000007, 0 },
+ { 0, 0, 0, 0, 0 }
};

void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
@@ -30,18 +36,6 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
u32 regs[4];
const struct cpuid_bit *cb;

- static const struct cpuid_bit cpuid_bits[] = {
- { X86_FEATURE_INTEL_PT, CR_EBX,25, 0x00000007, 0 },
- { X86_FEATURE_AVX512_4VNNIW, CR_EDX, 2, 0x00000007, 0 },
- { X86_FEATURE_AVX512_4FMAPS, CR_EDX, 3, 0x00000007, 0 },
- { X86_FEATURE_APERFMPERF, CR_ECX, 0, 0x00000006, 0 },
- { X86_FEATURE_EPB, CR_ECX, 3, 0x00000006, 0 },
- { X86_FEATURE_HW_PSTATE, CR_EDX, 7, 0x80000007, 0 },
- { X86_FEATURE_CPB, CR_EDX, 9, 0x80000007, 0 },
- { X86_FEATURE_PROC_FEEDBACK, CR_EDX,11, 0x80000007, 0 },
- { 0, 0, 0, 0, 0 }
- };
-
for (cb = cpuid_bits; cb->feature; cb++) {

/* Verify that the level is valid */
@@ -50,10 +44,34 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
max_level > (cb->level | 0xffff))
continue;

- cpuid_count(cb->level, cb->sub_leaf, &regs[CR_EAX],
- &regs[CR_EBX], &regs[CR_ECX], &regs[CR_EDX]);
+ cpuid_count(cb->level, cb->sub_leaf, &regs[CPUID_EAX],
+ &regs[CPUID_EBX], &regs[CPUID_ECX], &regs[CPUID_EDX]);

if (regs[cb->reg] & (1 << cb->bit))
set_cpu_cap(c, cb->feature);
}
}
+
+u32 get_scattered_cpuid_leaf(unsigned int level, unsigned int sub_leaf,
+ enum cpuid_regs_idx reg)
+{
+ const struct cpuid_bit *cb;
+ u32 cpuid_val = 0;
+
+ for (cb = cpuid_bits; cb->feature; cb++) {
+
+ if (level > cb->level)
+ continue;
+
+ if (level < cb->level)
+ break;
+
+ if (reg == cb->reg && sub_leaf == cb->sub_leaf) {
+ if (cpu_has(&boot_cpu_data, cb->feature))
+ cpuid_val |= BIT(cb->bit);
+ }
+ }
+
+ return cpuid_val;
+}
+EXPORT_SYMBOL_GPL(get_scattered_cpuid_leaf);
diff --git a/arch/x86/kernel/cpuid.c b/arch/x86/kernel/cpuid.c
index 2836de3..9095c80 100644
--- a/arch/x86/kernel/cpuid.c
+++ b/arch/x86/kernel/cpuid.c
@@ -46,10 +46,6 @@

static struct class *cpuid_class;

-struct cpuid_regs {
- u32 eax, ebx, ecx, edx;
-};
-
static void cpuid_smp_cpuid(void *cmd_block)
{
struct cpuid_regs *cmd = (struct cpuid_regs *)cmd_block;
--
2.7.4

2016-11-08 10:30:16

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] cpuid: Add a helper in scattered.c to return cpuid leaf info

On Tue, Nov 08, 2016 at 04:52:27PM +0800, He Chen wrote:
> Some sparse cpuid leafs are gathered in a fake leaf to save size of

s/cpuid/CPUID/

> x86_capability array in current code, but sometimes, kernel or other
> modules (e.g. KVM cpuid enumeration) may need actual hardware leaf
> information.
>
> This patch adds a helper get_scattered_cpuid_leaf to rebuild actual

get_scattered_cpuid_leaf()

> cpuid leaf, and it can be called outside by modules.

s/cpuid/CPUID/

> Also, export
> enum cpuid_regs in pt.c and scattered.c to enum cpuid_regs_idx in
> processor.h.

No need for that last sentence - it is obvious when looking at the diff
itself.

> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 984a7bf..e7f8c62 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -137,6 +137,17 @@ struct cpuinfo_x86 {
> u32 microcode;
> };
>
> +struct cpuid_regs {
> + u32 eax, ebx, ecx, edx;
> +};

Why do you export this? It is used in cpuid.c only.

> +enum cpuid_regs_idx {
> + CPUID_EAX = 0,
> + CPUID_EBX,
> + CPUID_ECX,
> + CPUID_EDX,
> +};
> +
> #define X86_VENDOR_INTEL 0
> #define X86_VENDOR_CYRIX 1
> #define X86_VENDOR_AMD 2
> @@ -178,6 +189,9 @@ extern void identify_secondary_cpu(struct cpuinfo_x86 *);
> extern void print_cpu_info(struct cpuinfo_x86 *);
> void print_cpu_msr(struct cpuinfo_x86 *);
> extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
> +extern u32 get_scattered_cpuid_leaf(unsigned int level,
> + unsigned int sub_leaf,
> + enum cpuid_regs_idx reg);
> extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
> extern void init_amd_cacheinfo(struct cpuinfo_x86 *c);
>
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index 1db8dc4..ef131ea 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -17,11 +17,17 @@ struct cpuid_bit {
> u32 sub_leaf;
> };
>
> -enum cpuid_regs {
> - CR_EAX = 0,
> - CR_ECX,
> - CR_EDX,
> - CR_EBX
> +/* Please keep the leaf sorted by cpuid_bit.level for faster search. */
> +static const struct cpuid_bit cpuid_bits[] = {
> + { X86_FEATURE_APERFMPERF, CPUID_ECX, 0, 0x00000006, 0 },
> + { X86_FEATURE_EPB, CPUID_ECX, 3, 0x00000006, 0 },
> + { X86_FEATURE_INTEL_PT, CPUID_EBX, 25, 0x00000007, 0 },
> + { X86_FEATURE_AVX512_4VNNIW, CPUID_EDX, 2, 0x00000007, 0 },
> + { X86_FEATURE_AVX512_4FMAPS, CPUID_EDX, 3, 0x00000007, 0 },
> + { X86_FEATURE_HW_PSTATE, CPUID_EDX, 7, 0x80000007, 0 },
> + { X86_FEATURE_CPB, CPUID_EDX, 9, 0x80000007, 0 },
> + { X86_FEATURE_PROC_FEEDBACK, CPUID_EDX, 11, 0x80000007, 0 },
> + { 0, 0, 0, 0, 0 }
> };
>
> void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
> @@ -30,18 +36,6 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
> u32 regs[4];
> const struct cpuid_bit *cb;
>
> - static const struct cpuid_bit cpuid_bits[] = {
> - { X86_FEATURE_INTEL_PT, CR_EBX,25, 0x00000007, 0 },
> - { X86_FEATURE_AVX512_4VNNIW, CR_EDX, 2, 0x00000007, 0 },
> - { X86_FEATURE_AVX512_4FMAPS, CR_EDX, 3, 0x00000007, 0 },
> - { X86_FEATURE_APERFMPERF, CR_ECX, 0, 0x00000006, 0 },
> - { X86_FEATURE_EPB, CR_ECX, 3, 0x00000006, 0 },
> - { X86_FEATURE_HW_PSTATE, CR_EDX, 7, 0x80000007, 0 },
> - { X86_FEATURE_CPB, CR_EDX, 9, 0x80000007, 0 },
> - { X86_FEATURE_PROC_FEEDBACK, CR_EDX,11, 0x80000007, 0 },
> - { 0, 0, 0, 0, 0 }
> - };
> -
> for (cb = cpuid_bits; cb->feature; cb++) {
>
> /* Verify that the level is valid */

@tip guys: this will conflict with the CAT changes. I've resolved it
this way by keeping the cpuid_bit.level sorted.

@@ -17,11 +17,20 @@ struct cpuid_bit {
u32 sub_leaf;
};

-enum cpuid_regs {
- CR_EAX = 0,
- CR_ECX,
- CR_EDX,
- CR_EBX
+/* Please keep the leaf sorted by cpuid_bit.level for faster search. */
+static const struct cpuid_bit cpuid_bits[] = {
+ { X86_FEATURE_APERFMPERF, CPUID_ECX, 0, 0x00000006, 0 },
+ { X86_FEATURE_EPB, CPUID_ECX, 3, 0x00000006, 0 },
+ { X86_FEATURE_INTEL_PT, CPUID_EBX, 25, 0x00000007, 0 },
+ { X86_FEATURE_AVX512_4VNNIW, CPUID_EDX, 2, 0x00000007, 0 },
+ { X86_FEATURE_AVX512_4FMAPS, CPUID_EDX, 3, 0x00000007, 0 },
+ { X86_FEATURE_CAT_L3, CPUID_EBX, 1, 0x00000010, 0 },
+ { X86_FEATURE_CAT_L2, CPUID_EBX, 2, 0x00000010, 0 },
+ { X86_FEATURE_CDP_L3, CPUID_ECX, 2, 0x00000010, 1 },
+ { X86_FEATURE_HW_PSTATE, CPUID_EDX, 7, 0x80000007, 0 },
+ { X86_FEATURE_CPB, CPUID_EDX, 9, 0x80000007, 0 },
+ { X86_FEATURE_PROC_FEEDBACK, CPUID_EDX, 11, 0x80000007, 0 },
+ { 0, 0, 0, 0, 0 }
};

void init_scattered_cpuid_features(struct cpuinfo_x86 *c)


--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2016-11-08 10:33:27

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] cpuid: add AVX512_4VNNIW and AVX512_4FMAPS instructions support

On Tue, Nov 08, 2016 at 04:52:28PM +0800, He Chen wrote:
> Add two new AVX512 instructions support for KVM guest.
>
> AVX512_4VNNIW:
> Vector instructions for deep learning enhanced word variable precision.
>
> AVX512_4FMAPS:
> Vector instructions for deep learning floating-point single precision.

Whoops, first time I'm catching something like this, thanks
checkpatch.pl:

ERROR: Missing Signed-off-by: line(s)

<--- ?!

> ---
> arch/x86/kvm/cpuid.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2016-11-08 15:14:30

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] cpuid: Add a helper in scattered.c to return cpuid leaf info

On Tue, 8 Nov 2016, Borislav Petkov wrote:
> On Tue, Nov 08, 2016 at 04:52:27PM +0800, He Chen wrote:
> > +struct cpuid_regs {
> > + u32 eax, ebx, ecx, edx;
> > +};
>
> Why do you export this? It is used in cpuid.c only.

Because I asked for it. We should have this stuff in one place to avoid the
next name clash.

What's worse with that patch is, that it does two things in one go:

- cleanup that whole cpuid_regs mess

- add the new function

That should clearly be seperated into two patches.

Thanks,

tglx