Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp511882imj; Sat, 16 Feb 2019 05:25:58 -0800 (PST) X-Google-Smtp-Source: AHgI3IbCDb0GaA3pDIIuc/ZmMJet1jjZRPHbQysTRkhcAcbSE2U9KFGGMkItBCrk6Vtul9fKBO5N X-Received: by 2002:a62:d10b:: with SMTP id z11mr14832607pfg.84.1550323558592; Sat, 16 Feb 2019 05:25:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550323558; cv=none; d=google.com; s=arc-20160816; b=IJT4lKZJHXQGvVJmDwWEMAklPII2BkENmBEyA2WMLDMb5lNe2vhuoHhR2ryRUpGO7n AyXiJgZzFPR5avtDkKPHYsIPbxxM92O5MX7VAdk6gYhxreIGr4Ag9qkEKTz2FkSQ5VTE kCnLpU45hiTeOW1iHMqDHOfJDDSxlZ3Aw3xf5cOECkqaevK4kS3WbvZE2UABrR2xK3En 3XbSdcZra4biRZtCio4vlKTYmhy9qr/iHZy6Qc2OnuJnXX2ywSGMxi4QDB7sVspiLgty 2UGkl6dZH3+Drw2y/miWxA6FfVYXA0+QoCPpVQ2ulwOyM/actiaOtDyY7m+6R4fsK9zq ML6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=33wT5A6v1oIegEX9e2hJx6lkQ0iXo+jarCTPxl1c3gM=; b=RUewxYaXkLDdH5YL1TOxQklRFqATB/Ee2YfgV4zd818F18b5vri1z9AZXJHpI+i4dm LnzamasICIRpg1yU1CJzEap3SbYDlWg1bswieHjMeswPD2pO2bAqeEExWSNVfVe4VIbw sKEtOz3yuTNJzZq/RbQKXaC1TjD3qIXKI1kG+GKhL0TxTdzfv4uEZfDhLogXMQk7pk7u gx/g33pipCEjfuUSJrRcYrPnWa4rrqnmXtOuCjLcW/lTFDQ9zV1Z4wvze0NkQLI10+FH zpxTKrXQoWru5KkH6I6+eHUqYocf+/twgxOM+nNIVlr4o1mO9RGF4ZLHqw0jHS59ahfN eDMA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q10si8451385pls.280.2019.02.16.05.25.41; Sat, 16 Feb 2019 05:25:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727650AbfBPMz4 (ORCPT + 99 others); Sat, 16 Feb 2019 07:55:56 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:54426 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726008AbfBPMz4 (ORCPT ); Sat, 16 Feb 2019 07:55:56 -0500 Received: from p5492e0d8.dip0.t-ipconnect.de ([84.146.224.216] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1guzVN-0004yB-J7; Sat, 16 Feb 2019 13:55:41 +0100 Date: Sat, 16 Feb 2019 13:55:41 +0100 (CET) From: Thomas Gleixner To: "Li, Aubrey" cc: mingo@redhat.com, peterz@infradead.org, hpa@zytor.com, ak@linux.intel.com, tim.c.chen@linux.intel.com, dave.hansen@intel.com, arjan@linux.intel.com, aubrey.li@intel.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH v11 2/3] x86,/proc/pid/status: Add AVX-512 usage elapsed time In-Reply-To: <85b22e16-fffc-7ebe-2ab1-3b6fe7e036ab@linux.intel.com> Message-ID: References: <20190213023748.6614-1-aubrey.li@linux.intel.com> <20190213023748.6614-2-aubrey.li@linux.intel.com> <85b22e16-fffc-7ebe-2ab1-3b6fe7e036ab@linux.intel.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 15 Feb 2019, Li, Aubrey wrote: > On 2019/2/14 19:29, Thomas Gleixner wrote: > Under this scenario, the elapsed time becomes longer than normal indeed, see below: > > $ while [ 1 ]; do cat /proc/6985/status | grep AVX; sleep 1; done > AVX512_elapsed_ms: 3432 > AVX512_elapsed_ms: 440 > AVX512_elapsed_ms: 1444 > AVX512_elapsed_ms: 2448 > AVX512_elapsed_ms: 3456 > AVX512_elapsed_ms: 460 > AVX512_elapsed_ms: 1464 > AVX512_elapsed_ms: 2468 > > But AFAIK, google's Heracles do a 15s polling, so this worst case is still acceptable.? I have no idea what Google's thingy does and you surely have to ask those people who want to use this whether they are OK with that. I personally think the numbers are largely useless, but I don't know the use case. > >IOW, this needs crystal ball magic to decode because > > there is no correlation between that elapsed time and the time when the > > last context switch happened simply because that time is not available in > > /proc/$PID/status. Sure you can oracle it out from /proc/$PID/stat with > > even more crystal ball magic, but there is no explanation at all. > > > > There may be use case scenarios where this crystal ball prediction is > > actually useful, but the inaccuracy of that information and the possible > > pitfalls for any user space application which uses it need to be documented > > in detail. Without that, this is going to cause more trouble and confusion > > than benefit. > > > Not sure if the above experiment addressed your concern, please correct me if > I totally misunderstood. The above experiment just confirms what I said: The numbers are inaccurate and potentially misleading to a large extent when the AVX using task is not scheduled out for a longer time. So what I'm asking for is proper documentation which explains how this 'hint' is generated in the kernel and why it can be completely inaccurate and misleading. We don't want to end up in a situation where people start to rely on this information and then have to go and read kernel code to understand why the numbers do not make sense. I'm not convinced that this interface in the current form is actually useful. Even if you ignore the single task example, then on a loaded machine where tasks are scheduled in and out by time slices, then the calculation is: delta = (long)(jiffies - timestamp); delta is what you expose as elapsed_ms. Now assume that the task is seen as using AVX when being scheduled out. So depending on the time it is scheduled out, whether it's due lots of other tasks occupying the CPU or due to a blocking syscall, the result can be completely misleading. The job scheduler will see for example: 80ms ago was last AVX usage recorded and decide that this is just an occasional usage and migrate it away. Now the task gets on the other CPU and starts using AVX again, which makes the scheduler see a short delta and decide to move it back. So interpreting the value is voodoo which becomes even harder when there is no documentation. Thanks, tglx