Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp617760imu; Fri, 9 Nov 2018 03:22:08 -0800 (PST) X-Google-Smtp-Source: AJdET5fPNQLIDg1xytQTE3RbfqVOG3Z+ft6atToiAUusuMbRigh3wxFXnbdlRFWGnlodJEzWTsAI X-Received: by 2002:a17:902:b03:: with SMTP id 3-v6mr8584314plq.233.1541762528619; Fri, 09 Nov 2018 03:22:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541762528; cv=none; d=google.com; s=arc-20160816; b=roG9/GC86Nh7nreoDqbQrYhk6+lPrbjXSCprhZNWDPn2p5OzaPwbnLx43tSDo3nkB9 OcYUbRzteGm1dM4TXOxkyvaUA7iYfjmOh3ZM7/JdMdymG1odZ64AyUKzndOizhk6j0qD Lb9oSm5iVLAldkuFNMg17YSAatMGXMveRZ7vF6DC6Kbgppvnwz73csFLrg52dtd7hfzo cZy6nyDpK/ljDzv50QvBqO4ujnn0Nz3nN9vCoQyYgGl0F42aKKk2JfVcGGDt3zXAIUTW 4EhBz1pFKpxuAxfvSxlHw5K6oWJrdbSPQ2PD1ar+7Wq6Q9D/XiePkAAlv/rJ3Lvm3keX nsKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=2/fXwFha5qFU+dbbmUUkaPt6Q/cCKGi25Zb1VkguzHw=; b=VdlDEz+cXTLtxMpTl5x2AK45P/7pa6VYwOznLV9VyZFWf3hhNFwy1Y3hSPcka1gPCR q9/0uvI4PQg9Hwvjijr8A67fCa1Y44ttuNfPWTVl344kl/GQS0yH4ndscRzZ+K/Zx5SL 79Tik2Edi9Bwdw+F20BCzMlpKnwvmQGUgQV4kDftPux2djVmFPb47guSCRhYFXsSCANa vA3zVLw5F2BbBzTlsnVVo3SAwqFWaGRMZV+Cypc5jqZZR4oL4YrdT4Z4WtcF0MizcPaR WGxggtlFt3l7ETTRyCpiU52X9HZlywJw8QqIQPDKsARaZIVlqD0uaWeIponIPKPa2jn6 TU1g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z10-v6si6214728pgv.487.2018.11.09.03.21.52; Fri, 09 Nov 2018 03:22:08 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727861AbeKIVBl (ORCPT + 99 others); Fri, 9 Nov 2018 16:01:41 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:46562 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727532AbeKIVBl (ORCPT ); Fri, 9 Nov 2018 16:01:41 -0500 Received: from hsi-kbw-5-158-153-52.hsi19.kabel-badenwuerttemberg.de ([5.158.153.52] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1gL4qb-00025s-3E; Fri, 09 Nov 2018 12:21:09 +0100 Date: Fri, 9 Nov 2018 12:21:08 +0100 (CET) From: Thomas Gleixner To: Aubrey Li cc: mingo@redhat.com, peterz@infradead.org, hpa@zytor.com, ak@linux.intel.com, tim.c.chen@linux.intel.com, arjan@linux.intel.com, linux-kernel@vger.kernel.org, Aubrey Li Subject: Re: [RFC PATCH v2 1/2] x86/fpu: detect AVX task In-Reply-To: <1541610982-33478-1-git-send-email-aubrey.li@intel.com> Message-ID: References: <1541610982-33478-1-git-send-email-aubrey.li@intel.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Aubrey, On Thu, 8 Nov 2018, Aubrey Li wrote: > Subject: .... x86/fpu: detect AVX task What is an AVX task? I know what you mean, but for the casual reader this is not very informative. So something like: x86/fpu: Track AVX usage of tasks would be more informative and precise. The mechanism you add is tracking the state and not just detecting in. > XSAVES and its variants use init optimization to reduce the amount of > data that they save to memory during context switch. Init optimization > uses the state component bitmap to denote if a component is in its init > configuration. We use this information to detect if a task contains AVX > instructions. Please avoid 'We use..'. Changelogs should be factual and precise and not be written as first-person narrative. Aside of that, you very well explained how XSAVES optimization works, but that's only a small part of the picture. The changelog should also contain a justification why this is necessary in the first place along with more missing bits and pieces. And please use paragraphs to structure the changelog instead of having one big lump. Something like this: User space tools which do automated task placement need information about AVX usage of tasks, because of .... The extended control register (XCR) allows access to the XINUSE state-component bitmap, which allows software to discover the state of the init optimization used by XSAVEOPT and XSAVES. Set bits in the bitmap denote the usage of AVX, SIMD, FPU and other components. The XSAVE variants store only the state of used components to speed up the operation. The XGETBV instruction, if supported, allows software to read the state-component bitmap and use this information to detect the usage of the tracked components. Add accessor functions and per task state tracking to context switch. The tracking turns on the usage flag immediately, but requires 3 consecutive context switches with no usage to clear it. This decay is required because of .... You surely had all this information in your head when writing the code and the changelog, so you could have spared me to dig all this out of the SDM. > +/* > + * This function is called during context switch to update AVX component state > + */ > +static inline void update_avx_state(struct avx_state *avx) > +{ > + /* > + * Check if XGETBV with ECX = 1 supported. XGETBV with ECX = 1 > + * returns the logical-AND of XCR0 and XINUSE. XINUSE is a bitmap > + * by which the processor tracks the status of various components. > + */ > + if (!use_xgetbv1()) { > + avx->state = 0; avx->state should be initialized to 0 when the task starts, so why does it need to be cleared when XGETBV is not supported? Also this is the only usage site of use_xgetbv1(). So please open code it here and avoid the extra inline which does not provide any extra value in this case. > + return; > + } > + /* > + * XINUSE is dynamic to track component state because VZEROUPPER > + * happens on every function end and reset the bitmap to the > + * initial configuration. > + * > + * State decay is introduced to solve the race condition between > + * context switch and a function end. State is aggressively set > + * once it's detected but need to be cleared by decay 3 context > + * switches I have no idea what _the_ race condition between context switch and a function end is about. Probably most readers wont have. > + */ > + if (xgetbv(XINUSE_STATE_BITMAP_INDEX) & XFEATURE_MASK_Hi16_ZMM) { In the changelog and also in the code you say AVX (avx->state), but this actually looks only for Hi16_ZMM state, which are the upper 16 AVX512 registers. So again, this wants to be documented in the changelog along with an explanation WHY this check is restricted to Hi16_ZMM state. And please rename the variable accordingly. This is confusing at best. > + avx->state = 1; > + avx->decay_count = AVX_STATE_DECAY_COUNT; > + } else { > + if (avx->decay_count) > + avx->decay_count--; > + else > + avx->state = 0; > + } Is there a reason why you need two variables for this? if (xgetbv(XINUSE_STATE_BITMAP_INDEX) & XFEATURE_MASK_Hi16_ZMM) tsk->hi16zmm_usage = HI16ZMM_DECAY_COUNT; else if (tsk->hi16zmm_usage) tsk->hi16zmm_usage--; and the function which exposes it later to user space can just check whether tsk->hi16zmm_usage is 0 or not. This is context switch and we really want to avoid every pointless instruction we can. Thanks, tglx