Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751544AbdCQP0m (ORCPT ); Fri, 17 Mar 2017 11:26:42 -0400 Received: from mail-ua0-f182.google.com ([209.85.217.182]:34013 "EHLO mail-ua0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751277AbdCQP0i (ORCPT ); Fri, 17 Mar 2017 11:26:38 -0400 MIME-Version: 1.0 In-Reply-To: <770cf82e-d966-19cc-f05a-f8150cc6866a@linux.intel.com> References: <770cf82e-d966-19cc-f05a-f8150cc6866a@linux.intel.com> From: Andy Lutomirski Date: Fri, 17 Mar 2017 08:26:16 -0700 Message-ID: Subject: Re: Perf regression after enabling nvme autonomous power state transitions To: Tvrtko Ursulin Cc: Andy Lutomirski , Jens Axboe , Christoph Hellwig , LKML , Chris Wilson Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2365 Lines: 64 On Fri, Mar 17, 2017 at 3:58 AM, Tvrtko Ursulin wrote: > > Hi Andy, all, > > I have bisected and verified an interesting performance regression caused by > commit c5552fde102fcc3f2cf9e502b8ac90e3500d8fdf "nvme: Enable autonomous > power state transitions". > > Having that patch or not accounts for approx. 3% perf difference in a test > which is, and this is the best part, not even i/o bound by any stretch of > the imagination. Weird. Is there something in the workload that is periodically reading or otherwise doing a synchronous IO? APST shouldn't affect throughput in any meaningful way, but it certainly affects latency of sporadic IO. > > The test is multi-process with overall medium CPU usage and high GPU (Intel) > usage. Average runtime is around 13 seconds during which it writes out > around 14MiB of data. > > It does so in chunks during the whole runtime but doesn't do anything > special, just normal O_RDWR | O_CREAT | O_TRUNC so in practice this is all > written to the device only the end of the test run in one chunk. Via the > background write out I suspect. > > The 3% mentioned earlier translates to approx. 430ms longer average runtime > with the above patch. That is surprisingly long. If your device is telling the truth about its latencies, this should be literally impossible. > ps 0 : mp:9.00W operational enlat:5 exlat:5 rrt:0 rrl:0 > rwt:0 rwl:0 idle_power:- active_power:- This is the normal on state. > ps 1 : mp:4.60W operational enlat:30 exlat:30 rrt:1 rrl:1 > rwt:1 rwl:1 idle_power:- active_power:- > ps 2 : mp:3.80W operational enlat:30 exlat:30 rrt:2 rrl:2 > rwt:2 rwl:2 idle_power:- active_power:- These won't be used at all. > ps 3 : mp:0.0700W non-operational enlat:10000 exlat:300 rrt:3 rrl:3 > rwt:3 rwl:3 idle_power:- active_power:- > ps 4 : mp:0.0050W non-operational enlat:2000 exlat:10000 rrt:4 rrl:4 > rwt:4 rwl:4 idle_power:- active_power:- These two have their latencies specified in microseconds and will be used autonomously. Can you re-test with /sys/class/nvme/nvme0/power/pm_qos_latency_tolerance_us set to the following values and report back? 0 - will turn APST off entirely 1 - will leave APST on but will not program any transitions 11000 - should allow state 3 but not state 4 Thanks Andy