Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755412AbcDAXg4 (ORCPT ); Fri, 1 Apr 2016 19:36:56 -0400 Received: from cmta4.telus.net ([209.171.16.77]:42714 "EHLO cmta4.telus.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753315AbcDAXgy convert rfc822-to-8bit (ORCPT ); Fri, 1 Apr 2016 19:36:54 -0400 X-Authority-Analysis: v=2.1 cv=fJ7Epsue c=1 sm=2 tr=0 a=zJWegnE7BH9C0Gl4FFgQyA==:117 a=zJWegnE7BH9C0Gl4FFgQyA==:17 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=Pyq9K9CWowscuQLKlpiwfMBGOR0=:19 a=IkcTkHD0fZMA:10 a=aatUQebYAAAA:8 a=8NbSq7y7kopiRJ3lPewA:9 a=QEXdDO2ut3YA:10 X-Telus-Outbound-IP: 173.180.45.4 From: "Doug Smythies" To: "'Rafael J. Wysocki'" Cc: "'Srinivas Pandruvada'" , "=?UTF-8?Q?'J=C3=B6rg_Otte'?=" , "'Rafael J. Wysocki'" , "'Linux Kernel Mailing List'" , "'Linux PM list'" References: <2727017.UmaUvtBLeX@vostro.rjw.lan> <3623107.tlAuqH4F7s@vostro.rjw.lan> <1459532674.13525.136.camel@linux.intel.com> <003801d18c44$ab9134e0$02b39ea0$@net> In-Reply-To: Subject: RE: [intel-pstate driver regression] processor frequency very high even if in idle Date: Fri, 1 Apr 2016 16:36:51 -0700 Message-ID: <003901d18c6f$5ec91530$1c5b3f90$@net> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AdGMUEHRPLTirEj7RQOi5ACmWpvOMQAHU+YA Content-Language: en-ca Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2808 Lines: 64 On 2016.04.01 12:54 Rafael J. Wysocki wrote: >On Fri, Apr 1, 2016 at 8:31 PM, Doug Smythies wrote: >> On 2106.034.01 10:45 Srinivas Pandruvada wrote: >>> On Fri, 2016-04-01 at 16:06 +0200, Jörg Otte wrote: >> > > > > > >>>> Done. Attached the tracer. >>>> For me it looks like the previous one of the failing case. >>> >>> The traces show that idle task is constantly running without sleep. >> >> No, they (at least the first one, I didn't look at the next one yet) >> show that CPUs 2 and 3 are spending around 99% of their time not in state >> C0. > How do you figure that out if I may ask? It is not so obvious to me > to be honest. The trace was not in the form for the post processing tools, so I had to manually import the trace into a spreadsheet and manually add new columns calculated from the others. Load = mperf / tsc * 100 % = C0 time. Duration (mS) = tsc / 2.5e9 * 1000 Note: I do not recall seeing an exact tsc for Jörg's computer, so I used The 2.5 GHz from the device spec from some earlier e-mail. Example (formatting will likely not send O.K.): CPU# time core_busy scaled from to mperf aperf tsc freq load duration (ms) -0 [002] 465.879451: 100 96 26 26 1826656 1826710 25062693 2500073 7.288% 10.025 -0 [003] 465.879484: 99 96 26 26 305796 305781 25147993 2499877 1.216% 10.059 -0 [000] 465.885794: 100 96 26 26 975908 975951 32434672 2500110 3.009% 12.974 -0 [001] 465.886898: 100 250 10 31 327356 327364 26673840 2500061 1.227% 10.670 -0 [002] 465.889527: 100 96 26 26 205336 205365 25133396 2500353 0.817% 10.053 -0 [003] 465.889555: 99 95 26 26 62544 62341 25117916 2491885 0.249% 10.047 > That the sample rate is ending up at ~10 Milliseconds, indicates some > high frequency (>= 100Hz) events on those CPUs. Those events, apparently, > take very little CPU time to complete, hence a load of about 1% on average. > > By the way, I can recreate the high sample rate with virtually no load > on my system easy, but so far have been unable to get the high CPU > frequencies observed by Jörg. I can get my system to about a target pstate of > 20 where it should have remained at 16, but that is about it. > >> The driver is processing samples for idle task for every 10ms and >> aperf/mperf are showing that we are always in turbo mode for idle task. > > That column pretty much always says "idle" (or swapper for my way of doing > things). I have not found it to very useful as an indicator, and considerably > more so since the utilization changes. > >> >> Need to find out why idle task is not sleeping. > > I contend that is it. Why? Unless I misunderstood, because the trace data indicates that the those CPUs are going into some deeper C stsate than C0 for most of their time. ... Doug