Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934988AbcJRPBg (ORCPT ); Tue, 18 Oct 2016 11:01:36 -0400 Received: from mga11.intel.com ([192.55.52.93]:39054 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932186AbcJRPB2 (ORCPT ); Tue, 18 Oct 2016 11:01:28 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,362,1473145200"; d="scan'208";a="21256089" From: Piotr Luc To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, Dave Hansen , Andy Lutomirski , Borislav Petkov , Brian Gerst , Denys Vlasenko , "H . Peter Anvin" , Josh Poimboeuf , Linus Torvalds , Peter Zijlstra , Thomas Gleixner , Ingo Molnar Subject: [PATCH v3] x86/cpufeature: Add AVX512_4VNNIW and AVX512_4FMAPS features. Date: Tue, 18 Oct 2016 17:01:11 +0200 Message-Id: <20161018150111.29926-1-piotr.luc@intel.com> X-Mailer: git-send-email 2.10.1 In-Reply-To: <20161018130818.GA25251@gmail.com> References: <20161018130818.GA25251@gmail.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4359 Lines: 98 AVX512_4VNNIW - Vector instructions for deep learning enhanced word variable precision. AVX512_4FMAPS - Vector instructions for deep learning floating-point single precision. The new instructions are to be used in future Intel Xeon & Xeon Phi processors. The bits 2&3 of CPUID[level:0x07, EDX] inform that new instructions are supported by a processor and can be used by programs. The patch defines new feature flags to enumerate new instruction groups in /proc/cpuinfo accordingly to CPUID bits. Because correct xsave setup is required to use AVX512 instruction, the patch clears the new feature flags in CPU caps to inform programs not to use the instructions if the setup fails. The spec can be found in Intel Software Developer Manual (SDM) or in Instruction Set Extensions Programming Reference (ISE). The implementation is based on Table 2.8 "Information Returned by CPUID Instruction" in ISE, https://software.intel.com/sites/default/files/managed/69/78/319433-025.pdf. v2: Initialize new bits in the scattered group. Add reference to correct description of new feature bits. v3: Fix v2 info. Add short info what the patch does. Signed-off-by: Piotr Luc Cc: Dave Hansen Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Ingo Molnar --- arch/x86/include/asm/cpufeatures.h | 2 ++ arch/x86/kernel/cpu/scattered.c | 2 ++ arch/x86/kernel/fpu/xstate.c | 2 ++ tools/arch/x86/include/asm/cpufeatures.h | 2 ++ 4 files changed, 8 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 92a8308..4ecbce9 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -195,6 +195,8 @@ #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */ #define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ +#define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */ +#define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */ /* Virtualization flags: Linux defined, word 8 */ #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */ diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c index 8cb57df..1db8dc4 100644 --- a/arch/x86/kernel/cpu/scattered.c +++ b/arch/x86/kernel/cpu/scattered.c @@ -32,6 +32,8 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c) static const struct cpuid_bit cpuid_bits[] = { { X86_FEATURE_INTEL_PT, CR_EBX,25, 0x00000007, 0 }, + { X86_FEATURE_AVX512_4VNNIW, CR_EDX, 2, 0x00000007, 0 }, + { X86_FEATURE_AVX512_4FMAPS, CR_EDX, 3, 0x00000007, 0 }, { X86_FEATURE_APERFMPERF, CR_ECX, 0, 0x00000006, 0 }, { X86_FEATURE_EPB, CR_ECX, 3, 0x00000006, 0 }, { X86_FEATURE_HW_PSTATE, CR_EDX, 7, 0x80000007, 0 }, diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 01567aa..7dbd480 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -73,6 +73,8 @@ void fpu__xstate_clear_all_cpu_caps(void) setup_clear_cpu_cap(X86_FEATURE_MPX); setup_clear_cpu_cap(X86_FEATURE_XGETBV1); setup_clear_cpu_cap(X86_FEATURE_PKU); + setup_clear_cpu_cap(X86_FEATURE_AVX512_4VNNIW); + setup_clear_cpu_cap(X86_FEATURE_AVX512_4FMAPS); } /* diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h index 92a8308..4ecbce9 100644 --- a/tools/arch/x86/include/asm/cpufeatures.h +++ b/tools/arch/x86/include/asm/cpufeatures.h @@ -195,6 +195,8 @@ #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */ #define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ +#define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */ +#define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */ /* Virtualization flags: Linux defined, word 8 */ #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */ -- 2.10.1