Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp5001591imu; Mon, 12 Nov 2018 22:43:21 -0800 (PST) X-Google-Smtp-Source: AJdET5fmpE9sfMyFgPLtWEN0ze5B+FLYLexxo1NvvbTaR208QV2afupjCzbCS0vv4Zy+RO3bf35d X-Received: by 2002:a17:902:2ac3:: with SMTP id j61-v6mr3844082plb.139.1542091401758; Mon, 12 Nov 2018 22:43:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542091401; cv=none; d=google.com; s=arc-20160816; b=wD/XY764yIubp9dgepnw70vVEy549VLyNyABbsRpfy17mOqfdclUer0ljayaMJBngc 8Uhet2NtjVQy4keHXw+XUAsO2LN6EGPtNQz9ZT/zHuOpZKzavOgkpQLJKY84nDJJpaTV IonsvpwnMGPAmHEblyZPWkCiWDHLld2EzI3KXWHVFerTeHJpffR1ZNEZwmjpOojZO7Ji 0C/bCCyz8uXT9NBwDhpCCjenWSirK6IXbJaaUN9l4Yc9xvFymSOg+sHwDIbs7shLvdJZ Qz0pG9c6g3o1MD0P2qdVIeYTDLwHugyjg84XrLOeP3fLs4FfrhyUJq0g1dQyzlqAnnaQ HBxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=bIjLJRSDvIcamn9n2YwrUTk79fmxzkIgySKtOkJ6Fqw=; b=Z1rt0926sIw+zrolfiMlKgkdohELXb0Tv0/xPipH9O22gG95SlvpbE6aT6zxvECfYf e824w2Pu1hlWDccirFYDNYw5T8NuCQuNH7gp/P9ayJRAdLagH7LmXp2zE2MR3k/oZGdx cG9Z9D9+ywxbIePCYaD+Sv+AVOV0J4DLZrcrKjxQWYKppnY5e0YUbFYAmvbvamrE41QR 5O3URpF1Z5rdxlSb+XJnfjjrCZ0wLAg1KOpXbPiVtaMwDH1snqfC1/GeeGyKzXZHPmBV o1EL9xUj6bKsYU0vdWRV4nqz7RSaUog4CtkZMOzkbBWMnyirpJXZeLb2AOCAKoxRUAho h4yw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t129-v6si22900883pfb.16.2018.11.12.22.43.06; Mon, 12 Nov 2018 22:43:21 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730885AbeKMQjW (ORCPT + 99 others); Tue, 13 Nov 2018 11:39:22 -0500 Received: from mga12.intel.com ([192.55.52.136]:35422 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726173AbeKMQjV (ORCPT ); Tue, 13 Nov 2018 11:39:21 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Nov 2018 22:42:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,498,1534834800"; d="scan'208";a="107786729" Received: from cli6-desk1.ccr.corp.intel.com (HELO [10.239.161.118]) ([10.239.161.118]) by orsmga001.jf.intel.com with ESMTP; 12 Nov 2018 22:42:39 -0800 Subject: Re: [RFC PATCH v2 1/2] x86/fpu: detect AVX task To: Dave Hansen , Aubrey Li , tglx@linutronix.de, mingo@redhat.com, peterz@infradead.org, hpa@zytor.com Cc: ak@linux.intel.com, tim.c.chen@linux.intel.com, arjan@linux.intel.com, linux-kernel@vger.kernel.org References: <1541610982-33478-1-git-send-email-aubrey.li@intel.com> From: "Li, Aubrey" Message-ID: <9bcd88eb-9433-debf-b831-29e064e87522@linux.intel.com> Date: Tue, 13 Nov 2018 14:42:38 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018/11/12 23:46, Dave Hansen wrote: > On 11/11/18 9:38 PM, Li, Aubrey wrote: > >>> Do we want this, or do we want something more time-based? >>> >> This counter is introduced here to solve the race of context switch and >> VZEROUPPER. 3 context switches mean the same thread is on-off CPU 3 times. >> Due to scheduling latency, 3 jiffies could only happen AVX task on-off just >> 1 time. So IMHO the context switches number is better here. > > Imagine we have a HZ=1000 system where AVX_STATE_DECAY_COUNT=3. That > means that a task can be marked as a non-AVX-512-user after not using it > for ~3 ms. But, with HZ=250, that's ~12ms. From the other side, if we set a 4ms decay, when HZ=1000, context switch count is 4, that means, we have 4 times of chance to maintain the AVX state, that is, we are able to filter 4 times init state reset out. But if HZ = 250, the context switch is 1, we only have 1 time of chance to filter init state reset out. > > Also, don't forget that we have context switches from the timer > interrupt, but also from normal old operations that sleep. > > Let's say our AVX-512 app was doing: > > while (foo) { > do_avx_512(); > read(pipe, buf, len); > read(pipe, buf, len); > read(pipe, buf, len); > } > > And all three pipe reads context-switched the task. That loop could > finish in way under 3HZ, but still end up in do_avx_512() each time with > fpu...avx->state=0. Yeah, we are trying to address a prediction according to the historical pattern, so you always can make a pattern to beat the prediction pattern. But in practice, I measured tensorflow with AVX512 enabled, linpack with AVX512, and a micro benchmark, the current 3 context switches decay works well enough. > > BTW, I don't have a great solution for this. I was just pointing out > one of the pitfalls from using context switch counts so strictly. I really don't think time-based is better than the count in this case. >>>> +/* >>>> * Highest level per task FPU state data structure that >>>> * contains the FPU register state plus various FPU >>>> * state fields: >>>> @@ -303,6 +312,14 @@ struct fpu { >>>> unsigned char initialized; >>>> >>>> /* >>>> + * @avx_state: >>>> + * >>>> + * This data structure indicates whether this context >>>> + * contains AVX states >>>> + */ >>> >>> Yeah, that's precisely what fpu->state.xsave.xfeatures does. :) >>> I see, will refine in the next version > > One other thought about the new 'avx_state': > > fxregs_state (which is a part of the XSAVE state) has some padding and > 'sw_reserved' areas. You *might* be able to steal some space there. > Not that this is a huge space eater, but why waste the space if we don't > have to? > IMHO, I prefer not adding any extra thing into a data structure associated with a hardware table. Let me try to work out a new version to see if it can satisfy you. Thanks, -Aubrey