Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1649265imu; Sun, 18 Nov 2018 06:11:12 -0800 (PST) X-Google-Smtp-Source: AJdET5dRZpBzAB0Jk3MCysxgIX4ohPWiWhlqeTrC9yNIFIJVlyUKQPambbWIK6ihhpHzOfbHyqth X-Received: by 2002:a63:4d:: with SMTP id 74mr17116765pga.248.1542550272361; Sun, 18 Nov 2018 06:11:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542550272; cv=none; d=google.com; s=arc-20160816; b=buqcdkbRzaQKE7VZtzbrdua9cuqbglJv7vwgYCSy8xXd4M5VHB4G2Jw/99gUl8aFwX uIXvFK8cxlA8i/lwjLiD3UmfiPjgF0pCOlkTktjbNAxd/7VG138/vmmi+rzgRpXlcYzL BeoK16ofPhRnpNooEpzGM81Llj6UaHkEKrYXVEfIFWfsj+xVFO0KPm80Mnsf2sDeLWLY aPz1br8cFwzPDJeit82D4UX6cOdiRCkFnMTW054KoWpaaj/0LtDuhVW5cbX2PMsdkypi ALv2QcI80ZnFQndoaogCOKWCModdBMTKn1etvwsAytf9ykt0dov6aciadSqq1Fk1BQ47 /1LA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :mime-version:dkim-filter; bh=5qicwi054cTlxoNU7+2Gyp2otEjlIj5SW7+nLW9ir/4=; b=a95Te4mlEF7mGx9NG4FhHasw8ayUuKBnNVxMW6rolPEuPNhXJzRtjDYgVAzCuWlTBP +ncUAEXA1j856K4oCh7b9nmQsa5AAmTMun1s6fTe4PDsntL0bHiWhuf79p3U8vnPbm3v 1kYegmhSEc21hO3bN63/x2S8VDpdVbTm9lrcNXDlvUosr+fkenv0BltYH0Y7BbQjAUmL YOoG8ngWasOTZ6vWoQ8Yr2jzOT6ZucUUEXYikJDlDa70mWORwkot5N9VU8k0IPK8wPSc 1b0kUbJFifq7gN7FJohMGDPanRWYC6LOYFZ6VGF6/CpJD0hVxo9UYwcCmbB+0VQWCd1g 3ihg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r2si9163222pgk.389.2018.11.18.06.10.56; Sun, 18 Nov 2018 06:11:12 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727082AbeKSAam (ORCPT + 99 others); Sun, 18 Nov 2018 19:30:42 -0500 Received: from smtp.dei.uc.pt ([193.137.203.253]:54376 "EHLO smtp.dei.uc.pt" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726180AbeKSAam (ORCPT ); Sun, 18 Nov 2018 19:30:42 -0500 X-Greylist: delayed 305 seconds by postgrey-1.27 at vger.kernel.org; Sun, 18 Nov 2018 19:30:39 EST Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54]) (user=sneves mech=PLAIN bits=0) by smtp.dei.uc.pt (8.15.2/8.15.2) with ESMTPSA id wAIE3vS4007701 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=FAIL) for ; Sun, 18 Nov 2018 14:04:02 GMT DKIM-Filter: OpenDKIM Filter v2.11.0 smtp.dei.uc.pt wAIE3vS4007701 Received: by mail-ed1-f54.google.com with SMTP id x30so18495035edx.2 for ; Sun, 18 Nov 2018 06:04:02 -0800 (PST) X-Gm-Message-State: AGRZ1gLI7zkcy8rxYi9Pr4ScDFYOabENM46/vc08CSTS9ZTckFKlv+4e cJXmEPRh+1jvGbo1CslmxTen+bzhZ4AwXIzab0U= X-Received: by 2002:a17:906:1a49:: with SMTP id j9-v6mr14203181ejf.45.1542549837007; Sun, 18 Nov 2018 06:03:57 -0800 (PST) MIME-Version: 1.0 From: Samuel Neves Date: Sun, 18 Nov 2018 14:03:20 +0000 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v3 1/2] x86/fpu: track AVX-512 usage of tasks To: aubrey.li@linux.intel.com, dave.hansen@intel.com, aubrey.li@intel.com, Thomas Gleixner , Ingo Molnar , peterz@infradead.org, "H. Peter Anvin" Cc: ak@linux.intel.com, tim.c.chen@linux.intel.com, arjan@linux.intel.com, linux-kernel@vger.kernel.org Content-Type: multipart/mixed; boundary="00000000000015e591057af0df23" X-FCTUC-DEI-SIC-MailScanner-Information: Please contact helpdesk@dei.uc.pt for more information X-FCTUC-DEI-SIC-MailScanner-ID: wAIE3vS4007701 X-FCTUC-DEI-SIC-MailScanner: Found to be clean X-FCTUC-DEI-SIC-MailScanner-SpamCheck: not spam, SpamAssassin (cached, score=-60.1, required 3.252, autolearn=not spam, ALL_TRUSTED -10.00, BAYES_00 -0.10, L_SMTP_AUTH -50.00) X-FCTUC-DEI-SIC-MailScanner-From: sneves@dei.uc.pt X-Spam-Status: No Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --00000000000015e591057af0df23 Content-Type: text/plain; charset="UTF-8" On 11/17/18 12:36 AM, Li, Aubrey wrote: > On 2018/11/17 7:10, Dave Hansen wrote: >> Just to be clear: there are 3 AVX-512 XSAVE states: >> >> XFEATURE_OPMASK, >> XFEATURE_ZMM_Hi256, >> XFEATURE_Hi16_ZMM, >> >> I honestly don't know what XFEATURE_OPMASK does. It does not appear to >> be affected by VZEROUPPER (although VZEROUPPER's SDM documentation isn't >> looking too great). XFEATURE_OPMASK refers to the additional 8 mask registers used in AVX512. These are more similar to general purpose registers than vector registers, and should not be too relevant here. >> >> But, XFEATURE_ZMM_Hi256 is used for the upper 256 bits of the >> registers ZMM0-ZMM15. Those are AVX-512-only registers. The only way >> to get data into XFEATURE_ZMM_Hi256 state is by using AVX512 instructions. >> >> XFEATURE_Hi16_ZMM is the same. The only way to get state in there is >> with AVX512 instructions. >> >> So, first of all, I think you *MUST* check XFEATURE_ZMM_Hi256 and >> XFEATURE_Hi16_ZMM. That's without question. > > No, XFEATURE_ZMM_Hi256 does not request turbo license 2, so it's less > interested to us. > I think Dave is right, and it's easy enough to check this. See the attached program. For the "high current" instruction vpmuludq operating on zmm0--zmm3 registers, we have (on a Skylake-SP Xeon Gold 5120) 175,097 core_power.lvl0_turbo_license:u ( +- 2.18% ) 41,185 core_power.lvl1_turbo_license:u ( +- 1.55% ) 83,928,648 core_power.lvl2_turbo_license:u ( +- 0.00% ) while for the same code operating on zmm28--zmm31 registers, we have 163,507 core_power.lvl0_turbo_license:u ( +- 6.85% ) 47,390 core_power.lvl1_turbo_license:u ( +- 12.25% ) 83,927,735 core_power.lvl2_turbo_license:u ( +- 0.00% ) In other words, the register index does not seem to matter at all for turbo license purposes (this makes sense, considering these chips have 168 vector registers internally; zmm15--zmm31 are simply newly exposed architectural registers). We can also see that XFEATURE_Hi16_ZMM does not imply license 1 or 2; we may be using xmm15--xmm31 purely for the convenient extra register space. For example, cases 4 and 5 of the sample program: 84,064,239 core_power.lvl0_turbo_license:u ( +- 0.00% ) 0 core_power.lvl1_turbo_license:u 0 core_power.lvl2_turbo_license:u 84,060,625 core_power.lvl0_turbo_license:u ( +- 0.00% ) 0 core_power.lvl1_turbo_license:u 0 core_power.lvl2_turbo_license:u So what's most important is the width of the vectors being used, not the instruction set or the register index. Second to that is the instruction type, namely whether those are "heavy" instructions. Neither of these things can be accurately captured by the XSAVE state. >> >> It's probably *possible* to run AVX512 instructions by loading state >> into the YMM register and then executing AVX512 instructions that only >> write to memory and never to register state. That *might* allow >> XFEATURE_Hi16_ZMM and XFEATURE_ZMM_Hi256 to stay in the init state, but >> for the frequency to be affected since AVX512 instructions _are_ >> executing. But, there's no way to detect this situation from XSAVE >> states themselves. >> > > Andi should have more details on this. FWICT, not all AVX512 instructions > has high current, those only touching memory do not cause notable frequency > drop. According to section 15.26 of the Intel optimization reference manual, "heavy" instructions consist of floating-point and integer multiplication. Moves, adds, logical operations, etc, will request at most turbo license 1 when operating on zmm registers. > > Thanks, > -Aubrey > --00000000000015e591057af0df23 Content-Type: text/x-csrc; charset="US-ASCII"; name="turbo.c" Content-Disposition: attachment; filename="turbo.c" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_jomy6sz30 I2luY2x1ZGUgPHN0ZGxpYi5oPgoKI2RlZmluZSBJTlNOX0xPT1BfTE8oaW5zbiwgcmVnKSBkbyB7 ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBcCiAgYXNtIHZvbGF0aWxlKCAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgXAogICAg Im1vdiAkMTw8MjQsJSVyY3g7IiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgIFwKICAgICIuYWxpZ24gMzI7IiAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICBcCiAgICAiMToiICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgXAogICAgI2luc24gIiAi ICIlJSIgI3JlZyAiMCIgIiwiICIlJSIgI3JlZyAiMCIgIiwiICIlJSIgI3JlZyAiMCIgIjsiICAg IFwKICAgICNpbnNuICIgIiAiJSUiICNyZWcgIjEiICIsIiAiJSUiICNyZWcgIjEiICIsIiAiJSUi ICNyZWcgIjEiICI7IiAgICBcCiAgICAjaW5zbiAiICIgIiUlIiAjcmVnICIyIiAiLCIgIiUlIiAj cmVnICIyIiAiLCIgIiUlIiAjcmVnICIyIiAiOyIgICAgXAogICAgI2luc24gIiAiICIlJSIgI3Jl ZyAiMyIgIiwiICIlJSIgI3JlZyAiMyIgIiwiICIlJSIgI3JlZyAiMyIgIjsiICAgIFwKICAgICJk ZWMgJSVyY3g7IiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICBcCiAgICAiam56IDFiOyIgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgXAogICAgOjo6ICJyY3giICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIFwKICApOyAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBc Cn0gd2hpbGUoMCk7CgojZGVmaW5lIElOU05fTE9PUF9ISShpbnNuLCByZWcpIGRvIHsgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgIFwKICBhc20gdm9sYXRpbGUoICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBcCiAgICAibW92ICQx PDwyNCwlJXJjeDsiICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgXAogICAgIi5hbGlnbiAzMjsiICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgIFwKICAgICIxOiIgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBcCiAgICAjaW5zbiAiICIgIiUlIiAj cmVnICIzMSIgIiwiICIlJSIgI3JlZyAiMzEiICIsIiAiJSUiICNyZWcgIjMxIiAiOyIgXAogICAg I2luc24gIiAiICIlJSIgI3JlZyAiMzAiICIsIiAiJSUiICNyZWcgIjMwIiAiLCIgIiUlIiAjcmVn ICIzMCIgIjsiIFwKICAgICNpbnNuICIgIiAiJSUiICNyZWcgIjI5IiAiLCIgIiUlIiAjcmVnICIy OSIgIiwiICIlJSIgI3JlZyAiMjkiICI7IiBcCiAgICAjaW5zbiAiICIgIiUlIiAjcmVnICIyOCIg IiwiICIlJSIgI3JlZyAiMjgiICIsIiAiJSUiICNyZWcgIjI4IiAiOyIgXAogICAgImRlYyAlJXJj eDsiICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg IFwKICAgICJqbnogMWI7IiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICBcCiAgICA6OjogInJjeCIgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgXAogICk7ICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIFwKfSB3aGls ZSgwKTsKCgoKaW50IG1haW4oaW50IGFyZ2MsIGNoYXIgKiogYXJndikgewogIGludCB4ID0gc3Ry dG91bChhcmd2WzFdLCAwLCAxMCk7CiAgYXNtIHZvbGF0aWxlKCJ2emVyb2FsbCIpOwogIHN3aXRj aCh4KSB7CiAgY2FzZSAwOgogICAgSU5TTl9MT09QX0xPKHZwbXVsdWRxLCB6bW0pOwogICAgYnJl YWs7CiAgY2FzZSAxOgogICAgSU5TTl9MT09QX0hJKHZwbXVsdWRxLCB6bW0pOwogICAgYnJlYWs7 CiAgY2FzZSAyOgogICAgSU5TTl9MT09QX0xPKHZwbXVsdWRxLCB5bW0pOwogICAgYnJlYWs7CiAg Y2FzZSAzOgogICAgSU5TTl9MT09QX0hJKHZwbXVsdWRxLCB5bW0pOwogICAgYnJlYWs7CiAgY2Fz ZSA0OgogICAgSU5TTl9MT09QX0xPKHZwbXVsdWRxLCB4bW0pOwogICAgYnJlYWs7CiAgY2FzZSA1 OgogICAgSU5TTl9MT09QX0hJKHZwbXVsdWRxLCB4bW0pOwogICAgYnJlYWs7CiAgY2FzZSA2Ogog ICAgSU5TTl9MT09QX0xPKHZwYWRkcSwgem1tKTsKICAgIGJyZWFrOwogIGNhc2UgNzoKICAgIElO U05fTE9PUF9ISSh2cGFkZHEsIHptbSk7CiAgICBicmVhazsKICBjYXNlIDg6CiAgICBJTlNOX0xP T1BfTE8odnBhZGRxLCB5bW0pOwogICAgYnJlYWs7CiAgY2FzZSA5OgogICAgSU5TTl9MT09QX0hJ KHZwYWRkcSwgeW1tKTsKICAgIGJyZWFrOwogIGNhc2UgMTA6CiAgICBJTlNOX0xPT1BfTE8odnBh ZGRxLCB4bW0pOwogICAgYnJlYWs7CiAgY2FzZSAxMToKICAgIElOU05fTE9PUF9ISSh2cGFkZHEs IHhtbSk7CiAgICBicmVhazsKICB9CiAgcmV0dXJuIDA7Cn0KCg== --00000000000015e591057af0df23--