Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp793556imj; Fri, 15 Feb 2019 06:57:42 -0800 (PST) X-Google-Smtp-Source: AHgI3IbC59j9hJ1dku1v3DvHmZ/lV6/pzT23Hn7LYEYJx04Wj0iPsudEJQAISnbOi7GMXFr2Qh7n X-Received: by 2002:a17:902:27a8:: with SMTP id d37mr10785771plb.182.1550242661968; Fri, 15 Feb 2019 06:57:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550242661; cv=none; d=google.com; s=arc-20160816; b=xaWz772ln/PFD0LD621hATVFuXhYUJq5sb6wIX8T5tFZzl+4zSCEjSUniWgdjAKeOt XrQgFXYt2PV65oISwavNn+31W2gV9W84F5udkuHxNM0Z02YVq6u7YmDAEpsKqD51Uy6u LSvq9e8A4NdF+r8j84+hsWVE/UU/ic5TBTyI66csN1sAgBcO9fDDxs8a1JQWZyDG9pYz sABAcS8kUWTWv/nJLpLWkC4OEE6TecJ6MRdoavcZLrx8Mmsh1+t10oQ+mBV/4bRAxXt5 infQroVHVh78xPrjmd0K2Msa55epcz8TWbBFZRnAmjLfdmUZwjIeDRmIh2GPnnn5F6ra AN4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=4sB4CmLvIxy+JOyEgd9uAmSGFVOnIaCx05kfpzW9/Aw=; b=P943jP0+sXT5feR41RceBE2fVk8pf8t/pkw8hmN8iv7+KoQwq8lIqj8hkNPAaCdiPT SnAsU4PtZ0QOVcahZebg6PmhFEXg2CvWNpk6nOYDvtETT71sM/gtopJa+CFvaFNDU3It LI8RZZYAzXSJCJ+ej3pMKS7i8lDNGLm/liWPcRZhgdZwQ/FK5WhD3D6HP+Mkq5WyLxRi 8iEmCGDK5q5NOCflP3lu9C37JMrgt403bbpfdPHt0XT2wp4mOj92auL7faPG3Xl3ShNo jeezMuphgRUQeyP5hcK7b6Gj1959L80nCPj7yyB+IYkKvD4sIlmpCnLngvnRA5h5zt86 vTdA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k11si5725633pgg.430.2019.02.15.06.57.25; Fri, 15 Feb 2019 06:57:41 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731388AbfBOEf4 (ORCPT + 99 others); Thu, 14 Feb 2019 23:35:56 -0500 Received: from mga05.intel.com ([192.55.52.43]:19471 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727231AbfBOEf4 (ORCPT ); Thu, 14 Feb 2019 23:35:56 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 14 Feb 2019 20:35:55 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,371,1544515200"; d="scan'208";a="319218099" Received: from cli6-desk1.ccr.corp.intel.com (HELO [10.239.161.118]) ([10.239.161.118]) by fmsmga006.fm.intel.com with ESMTP; 14 Feb 2019 20:35:53 -0800 Subject: Re: [PATCH v11 2/3] x86,/proc/pid/status: Add AVX-512 usage elapsed time To: Thomas Gleixner Cc: mingo@redhat.com, peterz@infradead.org, hpa@zytor.com, ak@linux.intel.com, tim.c.chen@linux.intel.com, dave.hansen@intel.com, arjan@linux.intel.com, aubrey.li@intel.com, linux-kernel@vger.kernel.org References: <20190213023748.6614-1-aubrey.li@linux.intel.com> <20190213023748.6614-2-aubrey.li@linux.intel.com> From: "Li, Aubrey" Message-ID: <85b22e16-fffc-7ebe-2ab1-3b6fe7e036ab@linux.intel.com> Date: Fri, 15 Feb 2019 12:35:51 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019/2/14 19:29, Thomas Gleixner wrote: > On Wed, 13 Feb 2019, Aubrey Li wrote: > >> AVX-512 components use could cause core turbo frequency drop. So >> it's useful to expose AVX-512 usage elapsed time as a heuristic hint >> for the user space job scheduler to cluster the AVX-512 using tasks >> together. >> >> Example: >> $ cat /proc/pid/status | grep AVX512_elapsed_ms >> AVX512_elapsed_ms: 1020 >> >> The number '1020' denotes 1020 millisecond elapsed since last time >> context switch the off-CPU task using AVX-512 components, thus the > > I know what you are trying to say, but this sentence does not parse. So > what you want to say is: > > This means that 1020 milliseconds have elapsed since the AVX512 usage of > the task was detected when the task was scheduled out. Thanks, will refine this. > > Aside of that 1020ms is hardly evidence for real AVX512 usage, so you want > to come up with a better example than that. Oh, I wrote a simple benchmark to loop {AVX ops a while and non-AVX ops a while}, So this is expected. Yeah, I should use real AVX512 usage. Below is tensorflow output to train a neural network model to classify images (HZ = 250 on my side). Will change to use this example. $ while [ 1 ]; do cat /proc/83226/status | grep AVX; sleep 1; done AVX512_elapsed_ms: 4 AVX512_elapsed_ms: 16 AVX512_elapsed_ms: 12 AVX512_elapsed_ms: 12 AVX512_elapsed_ms: 16 AVX512_elapsed_ms: 8 AVX512_elapsed_ms: 8 AVX512_elapsed_ms: 4 AVX512_elapsed_ms: 4 AVX512_elapsed_ms: 12 AVX512_elapsed_ms: 0 AVX512_elapsed_ms: 16 AVX512_elapsed_ms: 4 AVX512_elapsed_ms: 0 AVX512_elapsed_ms: 8 AVX512_elapsed_ms: 8 AVX512_elapsed_ms: 4 > > But that makes me think about the usefulness of this hint in general. > > A AVX512 using task which runs alone on a CPU, is going to have either no > AVX512 usage recorded at all or the time elapsed since the last recording > is absurdly long. I did an experiment of this, please correct me if I was wrong. I isolate CPU103, and run a AVX512 micro benchmark(spin AVX512 ops) on it. $ cat /proc/cmdline root=UUID=e6503b72-57d7-433a-ab09-a4b9a39e9128 ro isolcpus=103 I still saw context switch aubrey@aubrey-skl:~$ sudo trace-cmd report --cpu 103 cpus=104 avx_demo-6985 [103] 5055.442432: sched_switch: avx_demo:6985 [120] R ==> migration/103:527 [0] migration/103-527 [103] 5055.442434: sched_switch: migration/103:527 [0] S ==> avx_demo:6985 [120] avx_demo-6985 [103] 5059.442430: sched_switch: avx_demo:6985 [120] R ==> migration/103:527 [0] migration/103-527 [103] 5059.442432: sched_switch: migration/103:527 [0] S ==> avx_demo:6985 [120] avx_demo-6985 [103] 5063.442430: sched_switch: avx_demo:6985 [120] R ==> migration/103:527 [0] migration/103-527 [103] 5063.442431: sched_switch: migration/103:527 [0] S ==> avx_demo:6985 [120] avx_demo-6985 [103] 5067.442430: sched_switch: avx_demo:6985 [120] R ==> migration/103:527 [0] migration/103-527 [103] 5067.442431: sched_switch: migration/103:527 [0] S ==> avx_demo:6985 [120] It looks like some kernel threads still participant context switch on the isolated CPU, like above one, each CPU has one migration daemon to do migration jobs. Under this scenario, the elapsed time becomes longer than normal indeed, see below: $ while [ 1 ]; do cat /proc/6985/status | grep AVX; sleep 1; done AVX512_elapsed_ms: 3432 AVX512_elapsed_ms: 440 AVX512_elapsed_ms: 1444 AVX512_elapsed_ms: 2448 AVX512_elapsed_ms: 3456 AVX512_elapsed_ms: 460 AVX512_elapsed_ms: 1464 AVX512_elapsed_ms: 2468 But AFAIK, google's Heracles do a 15s polling, so this worst case is still acceptable.? >IOW, this needs crystal ball magic to decode because > there is no correlation between that elapsed time and the time when the > last context switch happened simply because that time is not available in > /proc/$PID/status. Sure you can oracle it out from /proc/$PID/stat with > even more crystal ball magic, but there is no explanation at all. > > There may be use case scenarios where this crystal ball prediction is > actually useful, but the inaccuracy of that information and the possible > pitfalls for any user space application which uses it need to be documented > in detail. Without that, this is going to cause more trouble and confusion > than benefit. > Not sure if the above experiment addressed your concern, please correct me if I totally misunderstood. Thanks, -Aubrey