Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp361737imu; Mon, 10 Dec 2018 23:45:41 -0800 (PST) X-Google-Smtp-Source: AFSGD/VhfleXejllNPmnzddYGFfqagEryS0BHgz1BirnN0pckYsArfurkutJOB6zRsB6ItEV4pZ9 X-Received: by 2002:a63:d604:: with SMTP id q4mr13467322pgg.175.1544514341242; Mon, 10 Dec 2018 23:45:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544514341; cv=none; d=google.com; s=arc-20160816; b=DFqavAh5QxHk+uqx9vdxAK7FPQNSQpJ2RhMxazEO6lukhHEjeQtmGDbmqVRfinZ0I9 sO9BLKEnpgqI8zq3BfcYyNHCy/d38ywPdMJrwEy/h/hLg1Wu37nCAL0FCOFMl7EaOt/H HtVJT7QvQNPlF0GrBEK9XIKSZqlP6n4bRqfbPFVfUilo7yFNoUn4IsYsuP8rYohktfTZ QWMI6HAsmtv9cOpFAbuLPLTazGoJMOoQN/mGFb9zY9YnfT5jKn5SGnaWIXBcmeZioLUb MWfGjjOvFDyEaBP6v5FPNwsuv5s9itkvrwupehoqI7QtGkn3H1q7D8b/bKl1JF0/Mv2w 9H5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=QQXwXwOF0voDr0bgq6mvcOHkPypq2fXGpDhBMVE+LaU=; b=C3agfMbtGz/i8quUkbfJLPXfJtZlhuYkadrfkABpB3M5r52VtwVAZT+MdhrXc+BvY2 c7CYDdqxsvH2ertJNPYkdmp//21ec+dGcZX9+Wg3bdQopERapv6WdvXaM6z4BzfxpVie WgGqviRFgoNugJHnBiv+SEOURiGMYlPzYY/0/se7HIgfN/SVzMZ2onCXKTMa4SeP8Jkr S4x9IttJbaJIw6Luo9Ekr8b+RMr+1XBYEY4xPHTKOvJTsFFaRI5OtM+M6rjeku4BK/8r I1c3PE/MiOm52Y6Aox5l3kgBBuGff29O57Q26O0VOjyHSb9ooQGSV0ha4Kw1xFlMRQE/ y1Wg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j10si11105297pgt.155.2018.12.10.23.45.25; Mon, 10 Dec 2018 23:45:41 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726154AbeLKHmx (ORCPT + 99 others); Tue, 11 Dec 2018 02:42:53 -0500 Received: from mga09.intel.com ([134.134.136.24]:36295 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725993AbeLKHmx (ORCPT ); Tue, 11 Dec 2018 02:42:53 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Dec 2018 23:42:52 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,341,1539673200"; d="scan'208";a="117644283" Received: from aubrey-skl.sh.intel.com ([10.239.53.9]) by FMSMGA003.fm.intel.com with ESMTP; 10 Dec 2018 23:42:50 -0800 From: Aubrey Li To: tglx@linutronix.de, mingo@redhat.com, peterz@infradead.org, hpa@zytor.com Cc: ak@linux.intel.com, tim.c.chen@linux.intel.com, dave.hansen@intel.com, arjan@linux.intel.com, aubrey.li@intel.com, linux-kernel@vger.kernel.org, Aubrey Li Subject: [PATCH v4 1/2] x86/fpu: track AVX-512 usage of tasks Date: Tue, 11 Dec 2018 08:24:47 +0800 Message-Id: <20181211002448.3520-1-aubrey.li@intel.com> X-Mailer: git-send-email 2.17.1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org User space tools which do automated task placement need information about AVX-512 usage of tasks, because AVX-512 usage could cause core turbo frequency drop and impact the running task on the sibling CPU. The XSAVE hardware structure has bits that indicate when valid state is present in registers unique to AVX-512 use. Use these bits to indicate when AVX-512 has been in use and add per-task AVX-512 state tracking to context switch. The tracking turns on the usage flag at the next context switch of the task, but requires 3 consecutive context switches with no usage to clear it. This decay is required because well-written AVX-512 applications are expected to clear this state when not actively using AVX-512 registers. Although this mechanism is imprecise and can theoretically have both false-positives and false-negatives, it has been measured to be precise enough to be useful under real-world workloads like tensorflow and linpack. If higher precision is required, suggest user space tools to use the PMU-based mechanisms in combination. Signed-off-by: Aubrey Li Cc: Peter Zijlstra Cc: Andi Kleen Cc: Tim Chen Cc: Dave Hansen Cc: Arjan van de Ven --- arch/x86/include/asm/fpu/internal.h | 22 ++++++++++++++++++++++ arch/x86/include/asm/fpu/types.h | 8 ++++++++ 2 files changed, 30 insertions(+) diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index a38bf5a1e37a..0da74d63ba14 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -275,6 +275,27 @@ static inline void copy_fxregs_to_kernel(struct fpu *fpu) : "D" (st), "m" (*st), "a" (lmask), "d" (hmask) \ : "memory") +#define AVX512_STATE_DECAY_COUNT 3 +/* + * This function is called during context switch to update AVX512 state + */ +static inline void update_avx512_state(struct fpu *fpu) +{ + /* + * AVX512 state is tracked here because its use is known to slow + * the max clock speed of the core. + * + * However, AVX512-using tasks are expected to clear this state when + * not actively using these registers. Thus, this tracking mechanism + * can miss. To ensure that false-negatives do not immediately show + * up, decay the usage count over time. + */ + if (fpu->state.xsave.header.xfeatures & XFEATURE_MASK_AVX512) + fpu->avx512_usage = AVX512_STATE_DECAY_COUNT; + else if (fpu->avx512_usage) + fpu->avx512_usage--; +} + /* * This function is called only during boot time when x86 caps are not set * up and alternative can not be used yet. @@ -411,6 +432,7 @@ static inline int copy_fpregs_to_fpstate(struct fpu *fpu) { if (likely(use_xsave())) { copy_xregs_to_kernel(&fpu->state.xsave); + update_avx512_state(fpu); return 1; } diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index 202c53918ecf..313b134d3ca3 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -302,6 +302,14 @@ struct fpu { */ unsigned char initialized; + /* + * @avx512_usage: + * + * Records the usage of AVX512 registers. A value of non-zero is used + * to indicate whether these AVX512 registers recently had valid state. + */ + unsigned char avx512_usage; + /* * @state: * -- 2.17.1